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FOREWORD 


This  report  is  a  continuation  of  bRC-l26-A~67-59 
entitled,  "Some  Results  in  A  Theory  of  Problem  Solving, "  which 
were  the  first  three  chapters  of  a  book  to  be  published  by 
American  Elsevier  Publishing  Company,  Inc. 

In  the  previous  report  it  was  mentioned  that  the 
chapters  of  the  book  following  Chapter  III  w^uld  not  be  published 
in  report  form.  Nevertheless,  the  present  report  is  being 
published  for  two  reasons.  One  is  that  the  chapters  renorted 
here  contain  much  material  which  did  not  appear  in  previous 
publications.  These  additions  are  both  in  the  way  of  formali¬ 
zations  and  interpretations  of  previous  work. 

Another  reason  for  publishing  this  report  is  the  fact 
that  on  seeing  the  first  report,  many  people  on  our  mailing  list 
have  af/vt'd  me  to  "send  Chapters  IV  and  V  as  soon  as  they  are 
written."  This  report  enables  us  to  do  just  that.  As  a  result, 
this  r  port  is  merely  the  fi»*st  draft  of  these  chapters.  I 
believe  that  the  final  version,  both  of  this  report  and  the 
previous  one,  will  be  much  more  readable  than  these. 

The  work  reported  on  here  was  carried  out  by  myself 
and  graduate  students  supported  by  the  j.  S.  Air  Force  Office 
of  Scientific  Research  under  grants  AF-OSR-125-67,  AF-OSR-125-65, 
and  by  the  National  Science  Foundation  under  qrants  GK  1386, 


6K  185  and  GP  658.  Some  of  the  graduate  stuaents  whose  work  is 
reported  here  were  supported  by  ndet  NSF  end  Case  Fellowships. 

A  word  of  apology  has  to  be  added  regarding  the  format 
of  this  report.  It  starts  on  "page  164"  ar.1  "chapter  IV.1  The 
main  reason  is  time  pressure.  The  Table  of  Contents  show 
similar  pecuriiarity  of  behavior.  The  page  numbers  shown  for 
Chapters  I  -  III  are  only  approximate.  The  contents  of  Chapters 
I  and  XI  are  exactly  as  in  the  previous  report,  in  Chapter  III, 
what  is  called  Section  9  in  the  Table  of  Contents  did  not  app~  - 
in  the  previous  report.  What  is  called  Section  10  in  that 
chapter  appeared  as  Section  9  in  the  previous  report.  The 
section  called  Section  9  of  Chapter  II  in  the  'T’abl'*  of  Contents 
appears  as  an  appendix  to  the  present  report.  The  page  numbers 
shown  in  the  Table  of  Contents  for  Chapters  IV  and  v  refer  to 
the  page  numbers  of  this  present  report. 
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CHAPTER  IV  -  DESCRIBING  PATTERNS 

1 .  introduction 

The  importance  of  pattern  recognition  to  solutions  uf 

problems  and  games  was  discussed  briefly  in  Section  3  of 

Chapter  I.  In  Chapters  II  and  III,  as  sets  like  ,  T‘ ,  , 

W.  ,  K.  ,  and  blocks  of  E.  or  that  of  the  Kernels  of  functions 
11  i 

like  Q  were  discussed,  it  was  implicitly  or  explicitly  stated 
that  he  use  of  these  sets  in  the  construction  of  solution 
methods  are  practicable  it  and  only  if  they  can  be  described 
efficiently. 

This  chapter  will  be  devoted  entirely  to  the  discussion 
of  descriptions,  and  the  way  the  efficiency  of  description 
depends  on  the  set  be irw  described  and  the  language  used  for 
describing  it.  Precise  definitions  will  be  attempted  for  the 
ideas  and  terms  involved.  All  the  motivations  for  the  formalisms 
introduced  will  not  be  repeated  here  since  some  of  them  have 
alreadv  been  discussed  in  Chapter  1.  Soeci ficaliv,  the  reader 
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of  these  expressions  as  denoting  sets.  For  this  latter  to  be 
r"S3iDle,  it  is  necessary  that  some  of  the  syntactic  entities 
be  predicates  defining  certain  11  intuitively  recognizable"  sets 
of  objects  in  the  Universe  of  Discourse.  in  addition,  the 
syntax,  has  to  have  various  ways  of  combining  predicates  to 
yield  compound  statements.  These  compound  statements  will  denote 
sets  which  are  uniquely  related  to  the  intuitively  recognizable 
sets  in  a  way  dictated  by  the  structure  of  the  compound  state¬ 
ments. 

The  first  few  sections  of  this  chapter  will  be  unvoted 
to  the  development  of  some  formal  definitions  and  th"n  to  some 
specific  languages  which  are  meaningful  in  a  wide  variety  of 
Universes  of  Discourse.  The  major  eaphasis  will  be  on  the 
"efficiencies"  of  these  languages. 

By  the  efficiency  of  a  language  for  the  description 
of  a  given  set  will  be  meant  the  "size”  (in  some  . ense)  of  the 
"shortest"  expression  which  denotes  that  -at.  This  size 
depends  on  the  set  as  well  as  the  predicates  of  the  language 
and  the  repertoire  Oi  combination  modes  available.  it  will  be 
caken  for  gr;.  led  that  new  predicates  can  be  defined  for  en¬ 
riching  the  language.  That  ,s,  some  compound  statements  may  be 
replaced  by  shorter  expressions  by  defining  new  syntactic 
entities  in  the  language. 

The  defin.il  on  of  the  word,  "size"  was  kept  purpose¬ 
fully  vague  in  -be  last  paragraph;  because  a  precise  definition 
(being  hea'  lly  dependent  on  technology)  is  hard  to  give  in 
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absolute  terms.  In  a  very  rough  way  one  may  say  that  the  size 
of  an  exp'mss.iOn  is  measured  fcy  the  number  of  symbols  in  it. 

In  the  author's  own  thinking  (for  reasons  which  will  be 
clarified  in  the  proper  context)  certain  symbols  m  the  ex¬ 
pression  (like  "or"  of  Propositional  Calculus)  seem  to  have 
greater  size  than  others  (like  "and"). 

The  discussion  above  has  been  limited  to  the  nature 
of  statements  which  describe  "given"  sets.  If  by  "given  set" 
one  means  a  set  for  which,  a  description  is  available  then  the 
problem  of  obtaining  a  snort  description  turns  out  to  be  a 
trivial  one  -  cr  at  least  a  problem  cf  transliteration.  However, 
if  by  a  "given  set"  one  means  a  sot  whose  elements  are  all 
av' ; lable  as  a  list,  then  one  can  consider  the  problem  of 
generating  a  succi.net  statement  in  a  language  which  will  be 
satisfied  by  every  element  of  the  list  and  by  none  else.  This, 
roughly,  is  the  problem  of  "concept  learning," 

Since  such  lists  are  impossibly  large  in  practical 
cases  one  may,  instead,  consider  a  case  where  only  some  members 
of  a  set  are  exhibited  on  a  short  list.  This,  however,  can 
give  no  meaningful  clue  to  a  learning  program.  One  can  infer, 
without  any  contradiction  from  the  presentation,  that  every 
object  belongs  to  the  set.  It  is  essential  that  at  least  some 
members  of  the  complement  of  the  set  are  also  exhibited  in 
another  short  list.  One  can  then  consider  the  problem  of 
generating  a  succinct  statement  which  will  denote  some  set  which 
contains  every  element  of  the  first  list  and  none  of  the  elements 
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of  the  second  list.  Typically,  such  a  statement  will  be 
satisfied  by  certain  objects  which  do  not  appear  on  either  list, 
Thus,  the  expression  will  have  "generalized1'  on  the  examples 
aiven.  The  mode  of  this  generalization  will  be  dependent  on 
the  method  used  for  generating  the  describing  expressions  and 
to  a  certain  extent  on  the  language,  since  the  language  de¬ 
termines  the  succinctness  of  the  statements.  However,  the 
"correctness"  of  the  resulting  generalization  -  whether  the 
descriptic..  actually  denotes  the  set  one  "had  in  mind"  in  con¬ 
structing  the  lists  -  is  not  at  all  determinable  from  the 
method  of  description  generation  alone. 

The  next  chapter  will  describe  certain  algorithms  for 
generalizations  of  the  restricted  ("not  necessarily  correct") 
variety.  It  will  also  discuss  the  possible  situations  under 
which  generalizations  may  turn  out  to  be  correct. 

Rough  definitions  of  a  few  more  terms  may  be  useful 
for  the  reduction  of  confusion.  In  the  literature,  the  term 
"pattern  recognition"  is  used  in  two  implied  senses.  In  one 
sense  it  stands  for  what  has  been  called  generalization  above, 
in  this  book  "Pattern  Learning"  and  "Concept  Learning"  or 
simply  "Learning"  will  often  be  meant  to  signify  the  same 
phenomenon.  In  the  other  sense  the  term  "Pattern  Recognition" 
means  the  recognition  of  an  object  as  belonging  to  a  pattern  of 
known  description.  In.  this  book  the  terms  "Pattern  Recognition 
"Reuuy ; tion"  and  "Object  Recognition"  will  all  be  used  in 


this  sense. 


Another  term,  "Concept.  Formation"  is  often  used  for 
what  has  been  called  "Concept  Learning1’  above.  In  the  next 
chapter  a  much  more  complex  phenomenon  will  be  called  "concept 
formation. " 

The  word  "concept"  will  often  be  used  for  the  word 

"pattern"  in  this  book.  The  reason  for  this  is  historical  - 

2  7  28 

the  initial  models  and  languages  developed  at  Case,  ;  were 

developed  for  understanding  the  psychological  process  of  concept 
19 

formation.  The  relevance  of  the  ideas  to  the  field  of  pattern 
recognition  was  realized  only  later.  This  realization  immedi¬ 
ately  led  to  the  need  for  further  developments  of  the  formalisms. 
It  is  the  author's  belief  that  these  further  developments  have 
made  the  theory  eve.,  more  relevant  „o  psychology  then  they  were 
before.  The  L....ory,  in  its  present  form,  therefore,  will  use 
both  the  terms. 
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2 .  Some  Basic  Terms  and  Discussions 

•me  present  section  will  formalise  some  of  the  basic 
ideas  referred  to  in  Section  3  of  Chapter  I  to  initiate  the 
discuss!;  u 

h  pattern  recognition  environment  (called  environment 
for  short)  is  an  ordered  pair  <U,  P>  where  U  is  an  abstract  set; 
and  P  is  a  family  of  non-trivial  partitions  on  U.  In  much  of 
what  follows,  the  family  P  and  each  of  its  elements  will  be 
considered  to  be  finite,  although  some  of  the  definitions  and 
results  are  meaningful  even  if  some  elements  of  P  are  infinite 
classes . 

U  will  be  referred  to  as  the.  Universe  of  Discourse 
(or  Universe  for  short) .  Each  element  of  P  will  be  called  a 
Property.  If  P  is  a  property,  tuen  each  element  of  p  e  P  (where 
p,  clearly  is  a  subset  of  U)  will  be  called  a  value  f  P. 

The  reader  will  notice  that  the  word  "value"  is  uc,id 
for  certain  pre-defined  subsets  of  L,ie  Universe  of  Discourse 
while  in  most  of  mathematical  .Literature  *’  -  word  "property" 
is  used  in  this  sense.  However,  it  may  be  worthwhile  to  recall 
that  the  property  "redness"  and  the  property  "color"  are  two 
distinct  things.  "Redness"  is  a  property  in  the  usual  mathe¬ 
matical  sense.  But  "color"  is  also  referred  to  as  property  in 
common  parlance:  a  fact  one  would  like  to  recognize  in  the 

theory.  Psychologists  use  the  words  "characteristic"  or 

29 

"dimension  instead  of  the  word  "oroperty." 

A  concept  (or  a  pattern)  is  defined  recursively  as 


* 
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follows ; 

(Ci)  A  value  of  a  p-operty  is  a  concept 

(Cii)  If  A  and  B  are  concepts  then  A  U  B  is  a 

concept 

(Ciii)  If  A  is  a  concept,  then  the  compleTnent 

A  of  A  is  a  concept 

(Civ)  Nothing  is  a  concept  unless  its  being  so 

follows  from  (i),  '  :.)  and  (iii)  above. 

In  most  of  the  previous  wur^-  at  Case,  (iii)  ab^ve  was 
replaced  by,  "If  A  and  B  are  concepts,  then  A  n  B  is  a  concept." 
However,  in  such  cases,  the  class  of  concepts  do  not  form  a 
Boolean  Algebra  except  for  the  cases  where  each  element  of  0 
was  a  finite  partition.  This  is  because  complements  of  concepts 
may  not  always  be  concepts  if  partitions  have  an  infinit.  number 
of  blocks.  The  difficulty  is  removed  by  the  definitions  above. 
It  could  also  have  boon  removed  by  allowing  infinite  unions  and 
intersections:  however,  since  the  description  languages  pre¬ 

supposed  in  this  book  begins  to  have  practical  difficulties 
any  time  infinite  ■Derations  are  used  (difficulties  shared  by 
any  pattern  recognition  scheme  using  inf ini tary  processes)  it 
was  considered  more  meaningful  to  have  the  definitions  as  above. 
One  can  motivate  the  above  definition  oi  a  concept  by  saying, 

"A  concept  is  a  set  of  things  whose  elements  are  recognizable 
as  belonging  to  it  by  virtu-'  of  their  properties." 

For  convenience  ef  later  discussion,  we  shall  define 
an  environment  to  be  l ini  to  if  ’is  a  finite  family  and  each 
'  is  a  f 1 n  1 1 r  pa r t  i t i on . 


element  of 


Given  a  subfamily  P'  of  P  one  defines  a  subclass 


Cj;)1  of  the  class  of  all  concepts  as  follows; 

(i)  Any  value  of  any  element  of  P'  is  a  member  of 
C 

p'  * 

(ii)  If  A  and  B  are  members  of  Cpi  ,  then  A  U  B  and 
A  are  members  of  Cpt . 

(iii)  Nothing  is  a  member  of  Cp,  un' ess  its  being  so 
follows  from  (i)  and  (ii)  above. 

By  this  definition,  the  class  C,3  is  the  ciass  of  all 

concepts . 


A  subfamily  P'  of  P  is  called  a  fine  structure  family 
if  and  only  if  C p,  =  Cp. 

A  finite  fine  structure  family  P'  =  [P.  , ,...P  } 

i  j.  n 

is  sa^.i  to  je  full  if 


li. 


P2i. 


n 


n  p. 


n  1 


(4.1) 


for  each  p  .  .  ‘  P  . 

~  r 

A  fine  structure  family  of  properties  in  any  environ¬ 
ment  set  the  limit  to  the  di.stinguishabi.lity  of  members  of  the 
Universe,  as  will  be  shown  presently.  If  the  fine  structure 
family  is  much  smaller  than  the  set  P,  then  the  properties  out¬ 
side  the  fine  structure  family  merely  affect  the  efficiency  of 
description  and  rot  the  ul cimate  capability  of  description. 
Nevertheless,  since  efficiency  of  description  is  crucial,  the 
distinction  between  P  and  P*  essential  to  the  considerations 
of  this  chapter.  To  keep  this  basic  role  of  the  fine  structure 
f  am 1 1 y  c 1 e  a  r  on  e  de f i n  e s  o  s 


f ol lows . 
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A  real  environment  is  a  triple  <!;,  d,  >  where  <U, 
is  an  environment  and  d1  is  a  fine  structure  subfamily  of  v. 

'y  is  not  necessarily  a  proper  subfamily  of  d,  although  in  all 
interesting  cases  it  wouid  oe. )  d’  will  be  called  the  input 
properties  of  the  environment. 

In  the  next  few  sections  only  finite  real  environments 
will  be  considered.  In  the  work  described  in  the  next  section, 
real  environments  with  a  full  fine  structure  family  of  input 
properties  will  be  assumed- 

All  description  languages  discussed  in  this  chapter 
will  have  as  its  motivation  the  Boolean  Algebraic  structure  of 
the  class  of  concepts  as  cV' fined  above .  Although  the  languages 
will  differ  in  the  mode  of  describing  concepts,  one  aspect  of 
them  will  remain  the  , ame .  Th  i  pertains  to  the  fact  that  any 
concept  of  the  form  shown  in  expression  4.1  above  is  in  one 
sense,  very  basic:  two  members  of  U  bo  th  of  which  belong  to 
the  set 
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i  i  n  1 1  o ) 


I  so  chat 


For 


I,  let 
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p.  e  p,.  Let  a  and  b  be  two  elements  of  U  such  that  for  each 
i  e  I,  both  a  and  b  is  a  member  of  . .  Let  C  be  any  concept. 

Then  a  e  C  if  and  only  if  b  e  C. 

Proof:  Let  C  be  a  concept  according  to  (Ci.)  in  the 
definition.  Then  there  is  a  e  ?'  and  a  p^,  e  P^  sue.,  that 
C  =  p...  Since  a  e  p.  and  a  e  :  =  p.,  ,  we  have  p.  f)  r  f  0. 

i  i  i  i  J- 

But  since  P.  is  a  partition,  0  p^,  /  0  implies  i  =  i'. 

Since  by  hypothesis  b  e  p,  ,  one  has  b  e  C  a  1  s  0 .  The  converse 
follows.  Let  now  the  theorem  be  true  for  any  concept  which  can 
be  constructed  with  n  or  less  set  theoretic  connectives.  Let  C 
be  constructed  by  n  +  1  set  theoretic  connectives.  If  C  =  A  B, 
then  either  a  e  A  or  a  a  B.  Let  a  e  A.  Since  A  is  construct ible 
with  less  than  n  connectives,  b  e  A  by  induction  hypothesis,  so 
that  b  e  A  v..  B.  Similar  ±y  if  a  e  B.  The  converse  follows  also. 
Tf  c  *  v’  A,  the"1  if  b  is  not  in  C,  it  is  in  A.  But  A  has  less 
than  n  connectives.  Hence ,  since  b  e.  A  and  by  the  symmetry  or 
the  theorem  a  c  A  which  is  impossible  since  a  e  A,  Hence, 


bee. 

Any  object  in  U  then,  can  be  completely  specified 
(so  far  as  its  membership  in  all  concepts  are  concerned)  by 
indicating  its  membership  m  one  element  of  each  of  the  proper¬ 
ties  in  d* .  On  the  basis  of  this  fact  one  can  make  the  following 
de  f  in  it  i.  ns  . 


Given  a  finite  real  environment  <U,  \d 
i zed  object  is  a  string  of  characters  of  the  for 
(  P  .  -  P  ;  P  ,  -  P  , 


ll 


.  P.  ,  p .  }  wnere  n  is  a  tir 

i  i 

n 


a  general- 


ite  intone 


v 


I 

I 


b 

t 
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such  that  lor  each  k,  P 


k 


p ,  P .  and 
ik  bk 


Pi  Pi 
1  2 


-  .  .  '  p,  f  0. 


k  "n 

A  generalized  object  is  an  object  if  for  ail  P  e 


md  p  c  P,  p. 


1 1 

1 


d.  is  either  contained  m  or  disjoint 
i 

n 


from  p. 

for  anv  finite  environment  (or  even  an  environment 
where  31  is  finite;  n  in  an  object  may  be  considered  to  be 
equal  to  the  cardinal itv  or  d'  even  wnere  the  environment  is 
not  full.  in  a  full  environment.,  of  course,  it  is  necessary 
to  have  n  equal  to  fie  cardinal  itv  of  in  any  object.  it 
may  be  noticed  Jiat  an  object  defines  a  concept,  to  wit  the 


•C’-pt  p. 


o .  . 

t 


■•£  the  svtr.bo  Is  P . 


l. 


therefore,  seem  unnecessary  in  the  definition  of  objects.  How¬ 
ever,  retaining  the  properties,  together  with  the  values,  has 
some  important  uses  ’which  will  become  clear  cowards  the  end  of 

t b  c  con  e  e o t  J e f i n e d 


bv  an  object,  also  as  twit 


verv  lit 


ronceots  w 


i  C '  O  i"  C  v-  1  v."  c>  u  i 


:  I  »V  _  . 


•:.j  v  a  i  u 
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vo'"iable.  The  predicate  will  be  true  for  all  members  of  the 


value  t  of  the  property  P.  Any  member  of  th '  oo]ect 

(P.  ,p.  P.  ,p.  },  men  satisfies  the  statement  S(x) , 

xl  L1  1n  V 

where  S(x)  denotes  the  statement 

(P.  (x)  =  p.  1  a  (P.  (x)  =  p,  )  a  ...  a  (p  (x)  =  Pi  ). 

xl  1  2  "2  n 

This  statement  may  be  considered  to  "describe"  the  object 

(p.  ,  p.  P .  , p  )  in  the  sense  that  the  sentence  S  (a) 

M  il  "n  n 


will  be  true  for  all  elements  a  of  the  object. 

Obviously,  concepts  other  than  objects  can  be  similarly 
"described"  by  statements  involving  the  basic  predicates  F  (x)  =  p 
where  P  e  pep  and  usual  ^ogical  connectives.  oxven  any 
object  and  the  description  of  any  concept  in  such  a  language, 
one  can  readily  determine  whether  the  object  is  contained  in 
the  concept  or  not.  Algorithms  and  formats  used  for  such 
recognition  processes  w  k  1 1  be  desenoed  present!,.  Meanwhile, 
certain  important  aspects  of  this  elementary  1 anguaue  will  rear 
discuss  ion. 

The  central  question  regarding  descriptions  are  the 

vo 1  lowing : 

(1 )  Given  a  concept,  how  should  its  description  be 

scored  so  as  to  use  as  small  an  amount  of  memorv 
a.  oos  sib  le? 


and  processed  so  that,  given  an  or  jeer,  ana  a 
ppt  £ pt p  ^  ^ p;  ^  o  ti  o  r rn  i  r  *  a  $  c  i  i  i  c  i  o r.  z  1  v  cs  it- s  ? 


l*%b 


% 


5/  V  VVv'v  ■*'■:  • 

::■ 

!:  :.  '-■■■  :>  - 


■'■•I 


it* 


whether  tl.e  object  is  contained  in  the  concept' 
( iii)  Given  two  seta  of  objects,  how  should  one  con¬ 
struct  a  short  description  of  a  concept  which 


contai'a  ax.’  elements  of  the  first  set  and  no 
element  of  the  second  set? 

In  this  and  the  next  chapter  some  o  the  earlier 
alternative  attempts  at  answering  these  questions  will  be 
described.  They  are  included  hero  because  they  compare  favor- 
ably  with  some  published  work  by  other  workers,  '  and,  in 
the  author's  opinion,  sheds  some  light  on  the  nature  of  the 


>■  _  ;:i-  M 


A.-; 

<1 


problem. 


■>  <1 


r  ' 

4!? 


-194- 


3 .  Conceptions  -  A  Description  Language 

The  discussions  in  this  sc  cion  are  based  on  the 
28  32 

work  of  the  autnor,  and  Pennypacker.  The  formalism  de- 

—7 

veloped  here  grew  out  of  some  previous  thoughts  of  the  author, 
•which  had  led  to  a  more  primitive  description  language  which 
was  later  abandoned  in  view  of  its  inefficiency.  However,  some 
of  the  basic  iueas  relevant  to  that  work  have  been  retained: 
these  have  been  discussed  in  the  previous  section- 

Given  an  environment  <U, P>  and  a  concept  C,  a  property 
is  called  Directly  Relevant  to  a  non-empty  concept  if  and  only 
if  it  has  at  least  one  value  whose  intersection  with  the  concept 
is  empty. 

A  property  P  is  called  relevant  to  a  non-empty  concept 
C  with  rc-pect  to  a  family  e  of  properties  if  and  only  if  either 
it  is  directly  relevant  to  C  or  if  there  exists  a  property 
Q(/P)  in  It  with  a  value  q  such  that  q  0  C  is  non-empty  and  P  is 
relevant  to  q  0  C  with  respect  to  3. 

In  short,  a  property  is  net  relevant  to  a  concept 
when  knowing  about  the  value  of  this  property  for  an  object 
does  not  (either  by  itself  or  in  conjunction  with  other 
properties)  help  in  the  recognition  of  the  object  as  belonging 
in  the  concept.  This  statement  will  he  formalized  presently; 
the  following  definitions  and  theorem  will  be  needed  for  this 
formalization. 

Given  an  environment  <U,  d>  and  a  concept  C,  a  finite 
subfamily  :  :  ^  ,  IJ2  ,  .  ,  .  P^]  of  G  is  called  sufficient  for  C  if 


if  either 


.  1  q  _ 


ana  cruv 


(Si) 


or 


n  -  1  and 

C  has  non-empty  in 

tersecti.ons 

with  more 

than  one  value  of 

Pn 

X 

!l 

II  )  O 

!-* 

p. .  where  p.  c 

1  -Li 

? .  and  there 
i 

is  no  subs 

e;  of  •  P 4  !  1  *•  i  s 

_i_  ’ 

n }  with  this 

property. 

(Note:  n  can  be 

1  in  this  ca 

also. ) 

A  set  of  prc  'erties  <7  is  called  a  sufficiency  family 
for  a  concept  C  if  either  c  is  sufficient  for  C  according  to 
(Si.i)  above  or  there  is  a  member  P  a  5  which  is  sufficient  for 


0  by 


(Si)  above  and  3  is  the  union  of  P  with  some  sufficiency 


family  of  each  of  the  non-empty  intersections  of  C  with  the 
values  of  P. 


family  for  C.  Let  p.  e  P.  for  each  i.  Then  either 

1  ii 


C  or  p  '  p., 
1  <- 


1  Pn  1  ’  C 


Proof :  If  there  is  no  property  in  i.P^j  such  that 

p.  has  mere  than  one  value  with  non-empty  intersections  with  C, 
k 


then  C 


P.-,  i  .  . 


p  and  the  theorem  is  evident.  Let 
n 


the  theorem  be  true  if  there  are  k  properties  in  [?.}  with  more 
than  one  value  with  non-empty  intersections  with  c.  The  theorem 
is  true  for  k  =  0.  if  it  is  true  for  k  =  m,  let  k  -  m+1.  Assume 
(without  loss  of  generality)  that.  P^  has  more  than  one  value 
with  non-empty  intersection  with  C  and  for  any  p1  c  P.^  such  that 
C  H  p*  0  0,  C  H  p*  has  a  .sufficiency  family  which  is  a  subset 


196- 


>f  (  P.  }  .  If  C 


n 


P- 


0,  then  the  theorem  fellows  immediately. 


Otherwise  the  sufficiency  family  for  C  f i  p,  which  is  a  subset 
of  {?..}  contains  less  than  or  equal  to  m  properties  which  have 
more  than  one  value  with  non-empty  intersection  with  C  fl 
Hence,  H  p,  Pi  . .  -  fl  i  C  fl  p.  c  C  by  induction  hypothesis. 
This  theorem  leads  to  the  following  explication  of  the  sign¬ 
ificance  of  relevant  properties. 

Theorem  4.3;  Let  3  (P^  , P2...P  }  be  a  sufficiency 

family  for  the  concept  c  Let  P^  e  3  be  not  relevant  to  C  with 
respect  to  3.  Let  =  {p^  ,  p^2  ,  ...p^}.  For  any  set 
(pjpi  S  Pi  ,  2^15  n}  if  0  t  Pil  H  P2  0  ...  0  Pn  fl  C  then 


for  all  k(l  5  k  ^  m)  H  p 


*  i ...  n  pn  nc/ 


Proof ;  By  Theorem  4.2  the  hypotheses 

pn  n  p2  n  . . .  n  pn  n  c  ^  implies  n  p2  n  . . .  n  pn  -  c, 


or  pxl  fip2  n  ...  n  pn  n  c 


pu  n  f2  n  . . .  n  pn  t  0»  if 


n 


fl  p  -  0,  then  P.  is  directly  relevant  to 


p  fl  •  ■  -  fl  p  fl  C(/  JIJ  and  hence,  relevant  to  C,  leading  to 
2  n 

contradiction. 

This  theorem  indicates  that  when  testing  an  object 
for  inclusion  in  a  concept,  irrelevant  properties  need  not 
be  tested. 

Let  C  be  a  concept  with  sufficiency  family 
{P^  ,  and  let  each  property  P.  be  relevant  *-o  C  with 

respect  to  [P^  , ?2  ,...P  }.  Then  a  list  of  k  .ists,  headed  by 
the  name  "C"  is  called  a  conception  list  of  C  j.f 
either  (Ci)  k  =  1,  the  unique  list  is  headed  by  the 
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name  "P."  where  1  ^  i  ^  n  and  P.  has  more 

i  i 

than  one  value  with  non-empty  intersection 
with  P..  It  is  a  list  of  ordered  pairs 

i 

consisting  of  the  names  of  the  values  of 
P.  with  non-empty  intersections  with  C 
together  with  the  nan.  s  for  these  inter¬ 
sections. 

or  (Cii)  n  =  k  and  each  list  is  headed  by  a  name 

"P^"  and  contains  a  single  ordered  pair 
consisting  of  the  name  of  the  un'que  value 
p.  e  P.  which  has  non-empty  intersection 
with  C  and  of  the  name  "C. " 

A  set  of  conception  lists  form  a  conception  of  a  con¬ 
cept  C  if  and  only  'f  it  contains  a  conception  list  of  C  and  a 
conception  of  every  concept  whose  names  occur  in  the  conception 
list.  It.  is  clear  that  a  conception  list  of  C  satisfying  (Cii) 
above  is  a  conception  in  itself. 

Some  of  the  ideas  associated  with  the  above  definitions 
and  assertions  can  be  exemplified  by  considering  a  specific 
Universe.  Consider  an  Universe  of  Discourse,  consisting  of  40 
elements  as  '^scribed  below  and  exhibited  in  Figure  4.1.  (This 
same  basic  Universe  will  also  be  used  in  exempli. __  j  the  ideas 
introduced  in  „ecti.on  5.) 

Th.  40  elements  will  be  denoted  by  the  consecutive 
positive  integers.  The  following  subsets  of  the  Universe  will 
be  taken  to  form  the  elements  of  the  basic  partitions. 
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Po  = 


(1,2,  .  , .  ,  10} 

(ll,  12,.  ,.,20} 
(21, 22,23.24, 25} 
[26, 27,28, 29, 30} 


P5 

= 

(31, 32, . 

•»  • 

,40} 

ql 

= 

(1,2,  11, 

12 

,  21, 

26, 

31, 32} 

q2 

= 

(3,4,  13, 

14 

,22, 

27, 

33, 34} 

q  - 

.j 

= 

(5,6,15, 

16 

,  23, 

28, 

35, 36} 

q4 

- 

(7,8,  17, 

18 

,  24, 

29, 

37, 38} 

q5 

= 

(9,10  19 

, 20, 25 

,  30 

,39,40} 

R1 

=  P5 

- 

qc  ;  Ro 

= 

q3  c 

'  q2 

-  P5  u 

(p2  f 

i  (<j3 

R3 

=  q5 

u 

i (q3  U  q 

4> 

n  (p3 

'in))  - 

V.-  ^  /  , 

F1 

•  P2 

R4 

=  U  -  R  - 

R2  ' 

R. 

S1 

-  cl% 

u 

q2  ;  S2 

= 

q  3 L 

1  q4 

;  S3 

q5 

T1 

=  P2 

w 

p  2  '  T-7 

= 

p3  L 

1  P4 

?  t3  = 

P5 

Wl 

=  q3 

; 

W2  q2 

•  1 
u 

q3  ; 

W3 

=  q4  J 

q5 

Figure  4.2  identifies  the  basic  sets  and  the  elements 
of  the  Universe. 

The.  -  are  5 ive  properties  in  this  environment 

S  =  [  S  j  ,82  ,8.,} 


T 


T1  '  T2  'T3] 


W  =  (W1  ,W2  ,W3} 
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f 

S 


R  * 

(r, 

'  R2 

'R3 

,R4i 

p  " 

{pl 

'  p2 

'P3 

.P4  .P5) 

q  = 

{<*1 

,q3 

'^4  ,c*5^ 

[p,q]  is  a  fine  structure  family  for  this  environment ;  so  is 
{p,W,s}.  The  family  { p , q 3  is  a  full  fine  structure  family, 
while  [p,W,s]  is  not  full.  In  the  present  discussion,  p  and  q 
will  be  taken  to  be  input  properties,  yielding  the  real  environ¬ 
ment  <U, IS, T,W, p,q, RJ . (p,qj>.  In  this  environment  [11,13]  is 
not  a  concept,  for  example. 

If  one  considers  the  concept  T,  ,  it  can  be  seen  that 
q  is  not  relevant  to  it.  with  respect  to  [p,q,  i}.  However,  q  is 
relevant  to  with  respect  to  [q  W, p] .  R  is  not  directly 
relevant  to  ,  although  it  is  relevant  to  with  respect  to 

[ p,-  r]  . 

The  conception  of  the  concept  A  =  W  H  (i.e.,  the 
jet  (3,4,13,14,13,16,22,27))  an  be  written  variously. 

A 


R  -  (R.  ,A) 

4L 


would  be  a  possible  {and  the  shortest  possible)  conception.  One 
other  would  be 

A  B 

i  I 

p  -  VP,  ,  B)  -  (p  ,C)  -  (P4  ,  D)  -  (p,  ,  G)  q  -  ,  E)  -  (q  ,  F) 


I 


k 
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c 

} 

D 

j 

E 

1 

F 

G 

) 

f  -  <*>3 

.c) 

r  -  <*>4 

>  d) 

1 

f  '  <f2 

/  E) 

| 

p  -  !p2 

,F) 

p  -  (px 

,G) 

i  -  <q2 

-C) 

q  ~  <q2 

,0) 

q  -  (q2 

/E) 

q  -  (q3 

F) 

q  -  (q2 

G) 

using  { p ,  q  3  as  ">  sufficiency  family.  Another,  somewhat  shorter, 
would  be  the  following 


p  -  (P2  ,B)  -  (Px  , G)  - 

(P4  - 

D)  - 

(o, 

j 

,c) 

f  ~ 

(P2 

/  B) 

W  - 

(W 

£ 

,  B) 

C 

j 

D 

| 

G 

j 

"  (P3  /C) 

1 

1  - 

(P4  - 

D) 

?  ’ 

<?1 

.G) 

q  -  (q2  - c) 

1 

S  - 
1 

<Sl  ' 

D) 

i 

q  - 

(q2 

•  G) 

j 

w  - 

(w2  , 

D) 

which  uses  (p,q,S,W)  as 

suf  f  1 

cienc 

* 

ami ly . 

It 

can 

be  seen 

easily  that  (p, q,  W)  or  (p,S.W)  could  be  used  as  sufficiency 
families  also.  it  can  also  be  -een  that  the  size  of  the 
sufficiency  family  used  has  no  strong  efiect  on  the  size  of  the 
conception.  While  (R,W)  is  a  very  effective  suffice  ncy  set. 
for  A,  ip, q)  is  not.  The  size  of  the  conception  gets  smaller 
when  we  augment  this  last  sufficiency  set  to  (p,q,S,W).  Changing 
(p,q,S,W)  to  (p,q,w)  actually  decreases  the  size  of  the  con¬ 
cept.  ion . 

It  should  also  be  noted  that  when  one  uses  a  full 
fine  structure  sufficiency  tamily,  the  conception  of  ?«y  con¬ 
cepts  other  than  values  of  fine  structure  properties  and  their 
intersections  end  up  containing  the  conception  of  every  obiect 


containec  in 
leads  to  the 
for  purposes 


the  concept.  This  is  the  consideration  which 
need  for  properties  other  than  input  properties 
of  description.  This  point  will  be  discussed 


aaa  n  in  late’-;  sections. 


4 


02 


,A  Re  cog  n  i  tion  A I  q  c nt  hm  using  Concept,  ions 

The  importance  of  conceptions  arises  because  there 
exists  an  algorithm  which  can  recognize  a  given  object  in  a  real 
environment  as  belonging  to  a  concept  whose  conception  is  given. 
This  algorithm  will  be  given  later,  after  some  other  ideas 
associated  with  recognition  have  beer  discussed.  However,  the 
basic  idea  involved  in  the  algorithm  can  be  indicated  here  quite 


ea" 


there  be  given  an  object  (P 


i  ■  r  i 


2  ;  *m2 


?  , p  )  and  the  conception  of  a  concept  C.  One  can  determine 

n  n 

whether  the  object  belongs  to  the  concept  as  follows.  If  the 
conception  list  of  C  includes  only  one  relevant  property 
(according  to  (Ci) }  P,  then  C  has  non-empty  intersections  with 
more  than  one  value  of  P.  Assume  for  simplicity  chat  P  is  an 
input  property.  Then  the  object  indicates  the  value  of  P  in 
which  it  is  contained.  If  this  value  does  not  occur  in  the 
conception,  list  of  C,  t:  on  the  object 


;Vj  f  .'■pp  f-  "g  j  TO  Oj.-I 


(See  Lemma  4. 7  below.)  On  the  other  hand,  i: 
contained  in  a  value  p.  of  P,  then  one  does  not  know  (or 

i. 

certain  that  the  object  is  contained  m  C.  ('  re  i  bull  i 
not  a  rod  ball  1}  One  then  interrc-T3t.es  the  conceot  ion  o 


wn.ose 


appears  in 


ic  i  c  u  i  c  i  v  t  i  r  u  v  r.  t*  i.  rr*  i : ;  t  . 


n c c p v.  i  v.  "  i  :st  o  1  C.  Th o 
.‘■'c  is  con  •:  a  in 


Th  e  a : 


t.  h  1 6  p  r  o  c  e  c 


i' 


:  w  ..  i  ■ 


the  recursive  determination  of  the  containment  of  the  object 


does  not  invoive  one  ir,  an  infinite  loop..  That  this 


does  not  occur  is  indicated  by  Lemma  4.6  below,  which  'ndicate; 
that  if  a  property  occurs  alone  in  the  conception  list  of  C, 
chan  x jrthef  tests  are  unnecessary. 

One  also  needs  to  discuss  the  case  where  a  set  of 
properties  P,  P’  ,  P" . . .  occur  in  the  conception  of  C  accord  no 
to  (Cii)  above.  If  this  is  an  unit  set,  then  the  procedure  is 


as  indicated  above .  except  that  the  name 


is  C  arc  lv 


Lemma  4.6  below  the  object  is  known  to  be  contained  in  i 
Otherwise,  C  has  the  form  p1  '  p1  "  p"  ". , .  and  one  mer< 


successively  to  see  if  the  object  is  contained  in 


p  ,  p  '  .  e  t  c  . 


till  the  list  is  exhausted.  Th<_  validity  of  this  ere 


c  oss  i  s 


brought  out  by  Lemma  4.5  below. 

The  rol lowing  four  lemmata,  although  uu’to  tr  •  L 
are  inc  uded  here  for  completeness  and  to  establish  that,  the 
structure  "concept  ion"  is  designed  with  the  care  that  should 
into  the  design  of  every  complicated  data  structure,  no  matt r 
h  o'  -  c  or,  p  1  i  c  a  tod. 

Lemma  4,4:  If  p,  i  P,  has  non-empty  intersect;  n  w: 

i.  ! 

C  and  it  x  is  object  wit'  non-empty  intersect  ion  v  •  t  h.  » h 
X  -  C  if  and  only  if  X  -  C  '  p,- 

L 

Proof  :  The  :i  i  I”  part,  is  obvious.  per  tine  e.  ’  v  •  ■■ 

part,  one  no^es  that  \  f  X  p ,  k  P  then  X  •  p .  ov  The  re-''  .1. 


an.;  by  the  tact  that  p,  is  a  cor.cu 


•  Pi  ■  -  L  O  L  H  J  t. 


P ,  .i  s  n  o !  i  -  or;  p  t  y  an  d  h  e  n  c  e 
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p,  and  ethers  are  not.-  contradicting  Theorem  4.1,)  Hence,  i: 


X  -  C,  then  X  =  X  Pi  ”  C  <  i  Pi  - 

JL  * 


Lenuna  4.5:  T 


T 


'li, 

i 


f  :  D 


2i„ 


X  f  0  then  X  *  C  if  and  only  if  X 


H  o  .  and 
1  ni 


'2  i. 


n 


m 


Proof:  If  n  *  1  the  set  {p~.  , . ..p  ,  }  is  empty 


l  L  .. 

I 


r.  x 


and  their  intersection  is  the  Universe.  Also,  X  »  p  & 

±  1 


C  -  d, .  ,  hence ,  X  - 


The  converse  follows  trivially  since 


X 


La  t  n  >  1 
If  X  ' 


li, 


■Dli. 


then  X  -  p-,; 


&  X  *  p  p  then 

x  .  ni 

'2  n 


Again,  if  X  -  C  -  p1 • 


*•  ni 


jeama  4 ,  t>  :  I : 


«tu’f  ieient  for  C,  and  p 


v<«  lue  of  F  such  that  C  ’  p.  f  S,  then  Pj  is  sufficient  tor 


on  1 v  it  c 


p  /*  xo  p  .  An  • 


nly  one  value  of  P  has  non-empty  intersect! 


*  I  L- 1  X- 


ran  be  sufficient  pv  C  p.  only  y 


v  i  r  i  ue  o  f  (Sir)  a  b  o  v  '■ 


s  c?  m  f  t  i  c  i  0  n  t  f  o  x 


the  n  < 


P  '5  - 

i 

Lemma  4.  it  p,  '  C  -  <?  -4  X  -  P-,  then  X  "  i 

■*-  4- 

The  pr^of  is  evident. 

It  was  assumed  in  our  qualitative  discussion 


=  a . 


n  At 


properties  mentioned  in  the  conceptions  are  all  input  properties 
and  hence,  are  listed  in  the  object.  In  this  case,  determina¬ 
tion  of  the  truth  of  statements  like  X  £  pn  above  is  a  trivial 
matter  of  look-up. 

However,  the  test  to  be  arfcrmed  to  find  whether 
X  s  is  not  always  such  a  straight-forward  process.  If  P  is 
not  a  member  of  f”  ,  one  needs  extra  information  to  know  if 
X  »  This  information  ^iil  be  codified  in  the  present 

recognition  scheme  by  stipulating  that,  conceptions  for  the 
values  of  P  i»  available  in  an  acceptable  form.  The  following 
definitions  clarify  what  is  meant  by  "acceptabl e"  here. 

The  description  list  of  the  Universe  is  a  list  of  lists 
headed  by  the  name  "'ll.”  Each  list  in  the  list  of  lists  is 
headed  by  the  name  of  a  r  operty  in  P,  relevant  to  U  with 
respect  to  P’  and  there  is  a  list  headed  by  the  name  of  each 
relevant  property  in  P,  The  list  headed  by  P  is  a  list  of 
ordered  pairs  containing  the  names  of  the  values  of  P  and  the 
nan e  of  their  intersections  ith  U. 

A  set  of  lists  is  a  description  list  structure  of  the 
Universe  if  it  contains  uie  description  list  of  the  Universe  and 
a  conception  for  all  concepts  whose  name  occurs  in  the  descrip¬ 
tion  list  of  the  Universe. 

It  is  to  be  noted  that  given  a  real  environment 
<U,  P’>,  the  description  list  of.  the  Universe  is  unique 

However,  the  conception  list  of  any  other  concept  is  not 
necessarily  unique:  nor  is  a  description  list  structure  of  the 
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dl 


universe  unic 


However,  the  following  theorem,  is  true. 


Theorem  4.7:  In  an  environment  <U,  P,P‘>,  each 


element  of  P  -  P 1 


relevant  to  the  Universe  with  respect  tc 


Proof :  Since  ?’  is  a  fine  structure  family  of 

properties,  anv  concept  is  a  Boolean  function  of  value  of 
elements  of  n‘ .  Also,  since  each  element  P  of  P  is  a  non¬ 
trivial  partition,  any  value  p  of  P  is  a  proper  subset  of  the 
Universe.  Hence,  there  is  an  object.  X  wnich  is  not  contained  in 
p.  Let  X  -  p,  P  P2  H  ...  n  P,,  where  n  is  the  cardinality  of 


P!  .  Then,  since  p  H  p1  ■")  p ..  H 


...Dp  =  0,  P  is  directly 

.n  J 


relevant  to  p.  O  p„  fj  ...  H  p  and  hence,  relevant  to  the 
c\rz  rn 


Universe. 


Hence,  the  name  of  every  element  of  P  -  P‘  heads  some 


list  in  the  description  list  of  the  Universe.  He. /ever,  this 
does  not  necessarily  make  a  description  list  structure  of  the 
Universe  11  acceptable*  information  for  finding  whether  an  object 
X  is  contained  in  some  value  p  of  a  non-input  property.  Some 
further  definitions  are  needed. 

Given  a  description  list  structure  of  the  Universe,  a 
property  P  e  P  is  called  pre-def ined  if  and  only  if  either 
(Di)  P  e  P' 

or  { D 1 i )  All  properties  whose  names  occur  in  the  con¬ 

ceptions  of  all  values  of  P  are  pre-defined 
properties. 

A  description  list  structure  of  the  Universe  is  called 
valid  if  every  element  of  P  is  pre-def ined.  To  exemplify 


I':  '  * 


T.;V 

I-:#  '  '  v -1 


'  V  " 

/  .  V 
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validity,  the  following  is  a  description  list  of  the  Universe 

shown-  in  Figure  4,1  in  Section  3.  Jhe  description  list  of  U  i 

U 

i 


! 


R  - 

j 

<R1 

•  R-i ) 

X 

"  'R2 

,r2) 

-  <R3 

,r3)  - 

-  (R4  'V 

S  - 
| 

(Si 

>sv 

-  (s2 

'  S2^ 

_  fr 

j 

1 

T  - 

i 

*  T 

,TX) 

>t2) 

'  (T3 

'  T3) 

} 

W  - 

{vi 

.wx) 

-  (w2 

.W2) 

-  (w3 

'V 

k  possible  description  list  structure  cf  y  might  contain  in 
addition  t;o  the  above,  the  following  conceptions 


R1 


q  - 

(«i 

,  a) 

-  <q2 

.3}  -  {q 

y)  - 

(q4  ,6) 

f  - 

(P5 

q  - 

(ql 

e 

Y 

1 

5 

! 

i 

f  ■ 

<% 

,  8) 

5 

1 

f 

-  (p5  -  Y) 

I 

? 

i 

-  (P5 

,6) 

q  . 

<«2 

/  8 ) 

q 

•“  (q3  -y) 

i 

q 

-  (q4 

,6) 

R2 

f 

M 

j 

1 

S  - 

<S1 

t  M) 

-  <S2 

*  n)  t 

-  ,rx 

.j)  - 

t2  (k) 

N 

1 

F 

j 

H 

1 

I 

q  - 

(q3 

,  F) 

-  «J4 

i 

>  H)  P 

-  (p2 

.F) 

1 

r 

(P2 

i  H) 

i 

q 

-  <‘J3 

,F) 

q  - 

(q4 

«  H) 
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W, 


w. 


—  (Q )  <i  (*3^  /  '^3^) 


2  ' *2‘ 


'3  ,V*3J 


W. 


0  "  i\T<  '*3/1^  ”  (Qc  >  S  -}  ) 


T  -  (Tx  ,^5  T  -  (T2  ,T2)  T  -  P3  ,T3) 

This  description  list  structure  would  not  oe  valid,  since  tJ  e 
concept  is  described  in  terms  of  T,  which  is  not  a  pre¬ 
defined  property?  also,  descriptions  of  the  concepts 
^2  '  q3  '  "*4  '  w*losG  names  occur  in  the  descriptions  of  values 
of  S,  do  not  occur  in  tne  list.  In  the  above  list,  one  replaced 
the  conceptions  of  and  bv 


yP^ 


PX) 


(Po 


P2‘ 


P  -  (P: 


P3> 


{p4  'p4} 


p  -  (p5  -t3) 

and  added  the  conceptions 
^2  q3  q4 

I  I  I 

q  "  (q2  '  q2  ^  q  -  (q3  '  q3'  *3  ~  ^4  '  *34 ) 


the  description  list  structure  would  be  valid. 

Theorem  4 . 8_:_  Given  any  object  X,  a  property  P» 
p  g  P,  and  a  valid  description  list  structure  of  a  finite 
Universe,  it  can  be  determined  by  a  finite  process  whether 
X  =  p. 

Proof :  One  first  associates  an  integer  with  every 

property  as  follows.  With  each  element  cf  P'  one  associates 
the  integer  1.  For  any  other  property  P,  -ne  associates  an 
integer  n  defined  as  follows 

Sr 

np  =  1  +  max  in  , J  P '  occurs  in  the  conception  of  some  value  of  P] . 

With  a  valid  conception  of  the  Universe,  n  is  uniquely  defined 

P 

for  every  property.  The  proof  is  by  induction  over  n  . 

P 

If  np  ~  1,  then  P  e  P'  and  the  name  of  a  value  of  P 
occurs  in  X-  X  ~  p  if  and  only  if  this  name  is  identical  with 
p.  Hence,  the  theorem  is  true  for  P  if  ■  =  1. 

Let  the  theorem  be  true  i  f  np  •£  k.  If  n  =  k+1,  then 

in  the  conception  of  p,  only  such  properties  Q  o^cur  such  that 

^  -  k. 

If  the  conception  list  of  p  contains  the  name  of  more 
tv,an  one  value  of  a  property  Q,  then  x  £  p  only  if  X  fl  vq  ft  0 

for  exactly  one  q  e  Q.  (Otherwise  X  is  properly  contained  in 

two  disjoint  sets.)  Since  there  are  only  a  finte  number  of 
such  values,  the  containment  of  X  in  one  of  them  can  be 


determined  by  a  finite  process. 
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If  the  conception  list  of  p  has  one  value  from  a 

finite  number  of  properties,  then,  since  for  each  property  Q 

in  this  list  r .  <  k,  the  containment  of  X  in  p  can  be 
w 

determined  by  a  finite  process. 

Tc  understand  the  way  the  integers  {rip}  are 
associated  with  the  properties  in  the  above  proof,  one  can 
once  more  invoke  the  valid  description  list  structure  of  the 
Universe  exemplified  in  Figure  41.  The  integers  associated 
with  the  various  properties  according  t^-  the  scheme  described 
in  Theorem  4.8  is  shown  in  Table  4.1.  It  will  be  also  noticed 
hat  conceptions  for  q,-  ,  p,-  and  did  not  have  to  be  induced 
in  the  valid  description  list  structure  since  their  names  ne  er 
occured  in  the  right  hand  side  cf  any  ordered  pair  in  anj  of 
the  conceptions. 

In  view  of  the  discussions  of  this  section  and  the 
last,  the  reader  will  be  able  to  convince  himself  that  the 
process  indicated  by  the  recursive  flow  chart  shown  in  Figure 
A. 2  can  effectively  determine  whether  an  object  X  belongs  to  a 
concept  C,  if  the  conception  of  C  and  a  valid  description  list 
structure  of  the  Universe  is  available.  In  this  flow  chart 
three  push  down  stacks  are  used;  j,  P  and  C.  The  list  L  is  « 
list,  of  ordered  pairs  which  is  to  be  empty  at  the  first  entry 
to  the  program.  The  name  of  the  concept  is  to  be  entered  in 
stack  C  before  starting  the  program.  In  the  flow  chart,  all 
variables  are  to  be  interpreted  in  the  normal  manner  as  de¬ 
noting  the  content  of  the  address  named.  In  the  case  of  stacks. 


j 


p  1 

q  1 


R 


3 


S 


2 


T  2 

W  3 


Table  4.  1 


•fir 

* 


1 

t  I’j 

■i*  J 


also,  the  same  convention  is  followed  except  that  the  content 
of  the  latest  call  is  d  a. fed  by  the  stack  name.  when  the 
stacks  are  referred  to  rather  than  their  cont  .tc,  quotes  are 
used. 

One  can  recall  at  this  point  the  different  con¬ 
ceptions  for  the  concept  A  in  the  previous  example.  It  can 
be  seen  that  the  second  conception  (having  (p, q)  as  sufficient; 
set) ,  although  the  most  space  consuming,  could  be  used  most 
efficiently,  because  the  route  marked  2  in  Figure  4.1 
(necessitating  the  use  of  the  description  list,  structure  of 
the  Universe)  is  never  used.  On  the  other  hand,  the  first 
conception  shown  (using  w  and  R)  needs  a  description  list 
structure  of  the  Universe. 

It  may  be  of  some  interest  to  indicate  by  ..  .  example 
the  way  the  valid  description  list  structure  is  used  in  the 
operation  of  the  flow  chart  indicated  in  Fiuure  4.1  for 
recognition  of  ar.  object  being  a  substd  .  the  concept  A. 

The  assumption  will  be  made  that  the  conception  of  A 
being  used  happens  to  fee 

A 

W  -  (W  .  , A) 

R  —  \  R  -  r\  ! 

and  the  obieet  involved  is  (p.p,  ..  :  q .. )  . 

Ini  iai  ly.  the  list  L  i  s  empty  ,»nd  '  d"  out.)  ms  *  - 

name  "A."  The  tes  in  hex  1  finds  w  unmarked  in  the  cone-;.  i  i  nr 
list  t  A  and  the  ru.me  w  is  entor-'d  iru  "p."  : :  met  w  ,•  ’ 


rdsW)  , 

■■■■  i;'  <f.a'  5. 

■  w, 

&M‘‘:  I 

.*vm.  - 

UJ  -i 


*  aWi 


i-,^  ; 


t 

v  • 

\  ■ 

d  v*  , 


*A  >v- 


^  1 

,(**■ 


J 
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R,  in  the  conception  list  of  A.  As  a  resu?t,  box  3  re-enter? 
tbs?  program,  tasting  the  object  as  subset  of  R:. .  The  property 
S  occuring  in  conception  list  of  k0  being  unmarked,  not  a 
member  of  P,  and  not  occur ing  in  L,  box  2  matches  in  the 
description  list  of  the  Universe  with  occur. ing  in  the  con¬ 
ception  list  of  Hence,  box  3  re-enters  the  program  testing 

the  object  er  a  subset  of  S-,  .  q  is  unmarked,  not  in  L  and 
occurs  in  P,  So  box  4  matches  qu  in  the  conception  list  of  S, 
with  the  value  of  q  in  the  object.  Hence,  box  5  tests  the 
object  as  a  subset  of  q„  (as  it  did  testing  for  W0)  and  on 
success,  box  3  places  (S,S^)  in  the  list  L,  following  which 
box  5  re-enters  the  progra  .t. sting  the  object  as  a  subset  of 

d  (the  intersection  of  S1  with  R2)*  T  rs  not  marked  in  the 
conception  list  of  M;  nor  is  it  in  P;  hence,  box  2  isolates 
the  val'-e  T  in  the  description  list  of  the  Universe  and  box  3 

4. 

rests  the  object  as  a  subset  of  p,  being  unmarked  in  con¬ 

ception  list  of  Tj,  not  occuring  in  L  and  not  being  a  member  of 
P*  ,  an  attempt  is  made  in  box  4  to  matchi  the  value  of  p  in  the 
object  (p.^)  with  the  value  and  p2  occuring  in  the  conception 
list  of  T-^ .  Tills  results  in  a  failure  exit  and  box  3  tests  the 
object  as  a  subset  of  This  succeeds  (in  the  same  way  as 

X  c  w2  succeeded).  Hence,  { T,  T., )  is  placed  in  L  by  box  8  and 
box  b  tests  the  object  as  a  subset  cf  K«  The  first  unmarked 
property  in  the  conception  list  of  K  (T) ,  occurs  in  L  (saving 
the  trouble  of  a  re-evaluation),  its  value  {t  )  matches  the 


value  of  T  in  k.  Box  6  marks  T  in  the  conception  list  of  K, 
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tests  for 

the  obj 

ect  as  subset  of  X  w 

ith  T  marked. 

finds  S, 

the  next  ■ 

unmarks  1 

property  of  I',  in  L; 

matches  its 

value  (S] ) 

in  L  with 

that  in 

the  c on cep t ion  list 

of  K;  hence, 

S  is  marked 

in  the  conception  lisc  of  K  and  the  next  re-entry  exits, 

unmarking  properties  of  x,  then  recognizing  "  and  hence,  box  8 

places  (R,  R  ‘j  in  !,  The  intersection  of  R„  with  A  being  A,  R 
z  z 

is  marked  in  the  conception  list  of  A  and  the  next  recursive 
entry  exits  with  success,  finding  the  object  as  a  subset,  of  A 
and  unmarking  the  conception  list  of  A. 
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5 .  Conjunctiva  and  Simple  Concepts 

As  has  been  pointed  out  before,  every  description 
language  is  constructed  out  of  a  set  of  predicates  and  a  mode 
of  combination  ~>f  predicates  to  yield  compound  statements  'ith 
only  one  free  variable  so  that  in  its  interpretation  it  denotes 
a  subset  of  the  Universe  of  Discourse.  In  the  languages  discus¬ 
sed  in  this  section,  the  basic  predicates  are  unary  also  {con¬ 
taining  a  single  variable),  in  the  language  discussed  so  far 
(whose  sentences  are  conceptions)  each  set  is  described  either 
as  the  union  of  a  class  of  disjoint  sets  or  as  the  intersec  •.  ion 
of  a  class  of  property-values,  The  basic  building  o^ocks  of 
the  concepts  then  are  the  crass  of  concepts  each  of  which  are 
intersections  of  a  class  of  property- values.  The  building 
blocks  will  be  cabled  "conjunctive  concepts"  for  the  purposes  of 
the  present  discussion. 

The  size  of  a  conception  describing  a  conjunctive 
concept  certainly  depends  on  the  number  of  property  values  to 
be  intersected  to  obtain  the  concept.  The  size  of  a  conception 
describing  concepts  other  than  conjunctive  concepts  is  larger 
than  the  sum  of  the  sizes  of  the  conceptions  describing  the 
disjoint  conjunctive  concepts  of  which  the  given  concept  is  the 
union.  If  there  is  more  than  one  conception  for  the  same  con¬ 
cept.  then  it  is  quite  difficult  to  decide  without  careful 
stud'-  as  to  which  of  the  given  conceptions  has  the  minimum  size. 
It  can  be  surmised  that  in  a  real  environment  where  P  -  §J>  and 
is  full,  the  conception  of  a  concept  will  be  smaller,  the  fewer 


-fif¬ 
ths  numbe:  of  conjunctive  concepts  used  as  building  blocks  for 
the  concept.  In  what  fo  lows  the  supposition  will  be  made 
that  if  a  conception  describes  a  concepts  as  a  conjunctive 
concept,-  then  this  is  the  smallest  conception  for  the  concept. 
Whether  such  a  conception  exists  or  not  for  a  concept  certainly 
depends  on  the  environment,  i.e.,  on  the  structure  of  the 
properties  available. 

Given  a  certa^’  real  environment  and  a  certain  con¬ 
ception,  it  ray  be  of  interest  to  find  a  shorter  conception 
which  denotes  the  same  concept.  A  method  for  doing  this  has 
been  developed  by  J.  C.  Pennypacker. J  ’  Other  related  methods 
developed  by  him  will  be  discussed  later. 

Given  an  Universe  (for  instance,  the  set  of  all 
occurances  of  bit-configurations  r>n  a  square  grid  of  photo-cells) 
whose  elements  can  be  coded  into  computer  inputs,  one  can 
generally  come  up  with  some  fine  structure  family  of  properties 
for  that  Universe.  In  the  case  of  the  square  grid  of  photo-cells, 
for  instance,  the  excitation  value  of  a  particular  photo-cell 
divides  the  set  of  all  bit-configurations  on  the  grid  into  two 
disjoint  subsets.  The  family  of  properties  defined  by  the 
class  of  all  photo-cells  forms  a  full  fine  structure  family. 

The  Universe  of  all  configurations  on  a  chess  board  has  as  fint- 
structure  family  the  occupancy  of  each  square  on  the  board. 

(As  an  aside,  this  fine  structure  family,  of  course,  is  not 

full  -  not  more  than  one  white  square  can  be  occupied  by  a 

black  biship,  for  instance. )  One  can  surmise  with  some  confidence 


that  finding  a  fine  structure  family  of  properties  for  an 
Universe  is  a  problem  that  Cc,n  be  safely  relegated  to  the 
intuition  of  the  experimenter* 

However,  in  most  Universes  (except  those  specially 
designed  by  psychologists  for  specific  tests)  one  is  specially 
interested  ir  naving  descriptions  for  certain  given  concepts 
(the  set  of  all  on  a  photo-cell  grid:  the  of  all 

forcing  situations  on  a  chess-board,  etc.)*  Generally  these 
concepts  are  not  conjunctive  concepts  t:.  one  restricts  oneself 
to  the  input  properties  alone.  In  the  interest  of  practicable 
brevity,  it  is  essential  to  have  properties  in  the  environments 
such  that  the  conceptions  for  these  concepts  be  short  and  (if 
practicable)  conjunctive.  A  large  part  of  the  effort  in  the 
f  ield  of  pattern  recognition  is  directed  tow*  rds  the  search  for 
suitable  properties  (the  values  of  these  properties  are  called 
"features"  in  the  field).  Often  acceptable  looking  features 
are  assumed  to  exist  and  statistical,  methods  are  developed  to 
reduce  the  probability  of  incorrect  classification  by  choosing 
the  least  harmful  conjunctive  concept  to  approximate  the  concep 
at  hand.  Concepts  other  than  conjunctive  ones  are  often  suc¬ 
cinctly  expressed  by  invoking  modes  of  combination  ether  than 
the  ones  used  in  Logic.  These  will  be  discussed  appropriately 
later.  Meanwhile,  one  may  be  tempted  to  pose  the  following 
general  problem,  "Given  a  class  of  concepts  in  a  given  real 
environment  <-U,  sj,  V  >,  t:o  enlarge  the  class  of  properties  such 
that  each  concept  in  the  class  is  conjunctive."  in  this  form, 


219- 


the  problem  has  a  trivial  solution,  "Use  each  concept  in  the 
class  togehter  with  its  complement  as  a  property."  This,  of 
course,  does  not  reduce  the  memory  size  in  any  way.  For  a  more 
realistic  posing  of  the  problem,  one  needs  to  take  into  account 
the  size  increase  involved  in  incorporating  these  new  properties 
into  the  description  list  structure  of  the  Universe.  The  prob¬ 
lem  called,  "feature  extraction"  is  closely  related  to  this 
problem.  To  the  best  of  the  author's  knowledge,  such  a  problem 
has  not  been  taken  up  in  the  literature  in  this  form.  Also, 
si  ,ce  the  measure  of  size  is  highly  language  dependent,  the 
development  of  more  powerful  description  languages  is  a 
pre-requisite. 

The  major  point  that  will  be  considered  in  this 
section  will  be  a  mode  of  combining  unary  predicates  which 
renders  it.  easy  to  have  short  descriptions,  not.  only  for  con¬ 
junctive  concepts  but  fc~  a  much  larger  class  of  concepts  which 
shall  be  called  "Simple”  concepts.  The  theory  developed  for 
the  purpose  will  also  indicate  methods  for  describin'!  non-simple1 
concepts  by  the  use  of  simple  concepts  which  approximate  it. 

Also,  in  a  later  chapter  it  will  be  shown  how  one  can  use  his 
language  for  "generalization"  or  concept-learning. 

At  the  present  lev''l  of  development  of  this  theory, 
no  distinction  is  made  between  input  and  non-input  properties. 
Since  given  an  environment  <U, P>,  s’  itself  is  a  fine  structure 
family,  one  can  say  that  tne  theory  deals  with  a  reai.  environment 
<U,  P,  it*.  At  the  present  stage  of  thought  it  is  not  clear 
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whether  the  extension  of  the  theory  to  the  case  where  P'  f  P 
in  any  but  the  most  trivial  way  will  be  of  use  or  not,  A 
lar^e  amount  of  theoretical  development  also  is  needed  because 
the  class  of  simple  concepts  indicate  close  relationships  to 
topologies  on  the  one  hand  and  with  decomposition  of  games  on 
the  other  hand.  This  will  be  indicated  in  detail  later. 

As  before  finite  environments  will  be  considered. 

Let  P  [P,  ,PV..P  }  and  let  for  e:  .  h  i  (1  ^  i  £  n )  . 

P.  =  [p.  ,  ,p.~  , ...p.  }.  Given  a  concept  X,  def  ine  a  set  Xs, 

X  JL  X  lx  1 L  . 

X 

called  the  superconcept  of  X  is  follows 


r. 

n 

i  “  1 


iPi^iPi; 


i  i  x  r 


0) 


That  is,  for  each  i,  one  defines  the  set  X.  which  is  the  union 

i 

of  tho~e  values  of  P.  which  have  non-empty  intersection  with  X. 

S 

X  is  obtained  by  takina  the  .intersection  of  X.  for  all  values 

l 

of  i*  As  an  example  in  the  environment  indicated  in  Figure  4,2 
the  superconcept  of  the  concept  a  =  {5,6,7,8,9,10,13,14,15,16, 
19, 20}  would  be  the  concept  fl  (W.y  U  W3)  H  (p^  U  P2 )  H 
(q2  U  q3  U  I4  U  qs)  0  (R2  ;J  R4)  n  (Sl  u  S2  u  s3)  ;  that  is 

a3  =  {3,4,  5,6,  7,  8,9,  10,  1  ,  14  15,  It,  17,  18,  19,  20  .  ~d  has  a  as 

its  subset.  This  is  true  in  general.  That,  is 

Theorem  4.9:  For  any  concept  X,  X  -  X S 

r  .  r  . 

i  i 

Proof:  X  =  X  H  U  ~  X  fl  U  p.  .  ^  U  (x  0  o.  1 

j  =  1  13  j  -  1  ‘  1 

for  each  i  ( 1  i  ••  n). 
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Hov/ever, 


r . 

i 


.  u  ,  fx  n  Pij-]  =  u  {x  n  p..|x  n  p. .  t 

j  =  i  -*  i.i  ij 


Hence, 


But, 


Hence 


X  - 


-  i-  n  iiixnPij^^] 


x  n  p .  ■  w  p , . 

i]  *t;j 


x  »  J  IP.  Jx  ,  pti  ft  0} 


'or  each  i  {  s  4  s  n' 
Hence, 


n 


C  ,  U  !XJX  nPij  *  *) 


X' 


1  =  X 

From  th; -  theorem  it  follows  that  Xs  can  be  token  as 
an  pproxiir'tion  i.or  X  in  the  sense  that  any  element  which  is 
nos  a  member  ox  Xs  ;s  c  rtainly  not  *  member  ci  X  As  a  matter 
of  fact,  a  much  stronger  -tateraen,  can  be  made  regarding  the 
approximating  ability  of  superconcepts,  it  can  be  noticed 
for  instance,  that  a  =  (R1  U  R_  U  R.)  fl  (w.  U  w.  J  w  ) 

*■  j  1  2  '  3 ' 

n  (s1  u  s2  u  o3)  n  cr1  u  t2  m  t3)  n  (P]L  u  p2  u  ,3  u  P„  u  p.) 

n  U  tp,  >  )  J  J  qCj;  whose  complement  is  the 


l-'n'  7 -  10,19,20}  which  is  a  subset  of  a. 

is  true  in  g*- to  ral :  tl  it  is 

Q.£  •'oilary  4.10;  For  any  concept  x,  Xs 


This  also 


X 
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Proof;  X' 


Hence,  X  2  X" 


•=— g  CJ 

Thus,  X  and  X'  can  '  a  ooked  upon  as  lower  ana  os- 

3 

bounds  of  X.  Her.ce,  if  one  f  .ores  descriptions  oZ  X  and 


— g 

X  ,  one  can  recognize  vario ..3  object:  ns  being  definitely  con¬ 
tained  in  X,  others  as  def  litely  n  .  i  ing  <  ->nt  ir.ed  n  X. 

In  addition  to  the  fac  1a  the  upe  concepts  of  a 

concept  and  its  complement  yiel  .0  approx imat .  >ns  to  a  con 
cept,  it  is  to  be  noted  that  the\  a.  c  nave  rathe-  simple  de 
script  ions  in  a  specific  language.  TV  .s  is  brought  on*-  y  tne 
following  theorem. 

Theorem  4.11:  For  any  concept 
„s 


X' 
Proof : 


rp  Tip  •  e  P.  ,  p.  e  d.  .  T)  x  -  $ 
r  1  j 1  r  1  j  1  1  *■  1 1 


n 

xs  =  n  u  (  p  .  •  |  p  . . 

r  1 1  1  1  7 

1  =  1  -  J 


?i  ■  pij  x  *  01 


n 

n  m 


1  = 


r?~Tp  ■  ■  t  p  ■ 

^  1  J  '  ^  t  J  l 


p  ■  X  =  0  ! 
1  ] 


n 

J 

i  -  1 


p  •  t.  p .  ,  p .  n  x  =  >3  j 

1  j  1  1  j  ' 


=  U  i  P  ■  P  ■  t  P  ,  '  .  f  \\  O  . 

1  j  1  j  1  - 1  - 1 1 


X  -  K  I 


Hence,  the  superconcept  of  any  cc  cept  x  can  be 
described  by  storing  the  list  of  those  property  values  which 


have  empty  intersections  with  X. 
could  be  scored  as 


Thus . 


the  description  oi 


e.b  =  UIIP3  'P4  .  P5  -  qx  .  Rx  »R3  >  Tp  ,  T3  ,  WhT) 

Clearly,  this  new  mode  of  description  of  concepts  makes  it 
necessary  to  have  a  new  algorithm  for  determining  whether  a 
give.,  object  is  contained  in  a  superconcept  or  not.  Such  a 
program  will  be  discussed  later.  Meanwhile,  it  is  worthwhile 
po.  'ting  out  that  the  present  language  of  description  (describ¬ 
ing  the  superconcept  of  a  concept  and  its  complement)  does  not 
restrict  one  to  storing  approximations  alone.  At  some  extra 
t,  all  concepts  can  be  described  exactly  if  one  allows  in 
the  any  rage  the  capability  of  expressing  the  union  of  described 

sets.  To  see  how  this  can  be  done  the  following  arc  introduced. 

s  s  s 

T.ieorem  4.11:  For  all  concepts  X  -  (X"')' 

’roof : 


X 


U  ip.  -Id.  .  «•:  P  ,  p  .  •;  \\  o .  .  '  X  =  0 » 

1 1 1  ‘  1 3  1  1  'll 


,  i  L ' 


for  any  p.  t  P  such  that  P.  <\  d. 

n  1  1  ‘  1  j 


0  implies 


o 

1  1  1 


X'  th.  t  is  p  -  x  implies  p .  -  X‘  ;  replacing  X  bv  X' 

1  j  1  ! 


s  s  s  — 

obtains  p  >-  X  implies  p  -  (X  )  ,  1 .  e. ,  p  _  -  x.  impiie 

;  1  ]  'll 


xs>s. 


Hence , 


however,  by  Theorem  4.9 
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s 


I 


sub’atu  :e  of  the  set  of  all  subsets  of  K,  considered  as  a 

lattice  under  inclusion  is  anti -homomorphic  to  the  lattice  of 

33 

all  simple  concepts  under  inclusion.  For  the  purpose  of  this 

sectio:  it  is  only  neede  ,  show  that  the  set  of  all  subsets 

of  K,  partially  ordered  by  inclusion  is  ’’ti -homomorphic  to  the 

set  of  all  simple  concepts,,  partially  ordered  by  inclusion. 

The  mapping  H»  described  above  is  the  anti -homomorphism  involved. 

This  is  shown  in  the  following  theorem. 

Theorem  4.13;  Let  a  an^  0  be  subsets  of  K  and  let 

H (a)  and  H(f3)  be  the  corresponding  simple  concepts.  Then 

a  c  3  implies  H(3)  £  H{a). 

Proof:  If  a  c  0  then  o. .  s  a  implies  p. .  c  B  for 

— - - -  *  i J  r  13 

each  p. .  e  p,  ,  p.  e  p. 

13  l  i 

Hence, 

U  £  a)  =  U  iPi;j!pi:j  e  ?)■ 

Hence, 

H(p)  *  U  fp'ijlP^TT  c  c1  [pi.TpT”~TaT  =  H(a) . 

The  converse  of  this  theorem  is  not  necessarily  true: 
there  may  be  more  than  one  a  with  the  same  value  for  H(a).  How¬ 
ever,  among  all  a  having  the  same  value  for  H<a)  an  unique  one 
can  be  chosen. 

Theorem  4.14;  if  H{a)  =  H(a)  -  A,  then  H(a.  U  3)  "A. 
Proof:  Let  a  and  0  be  two  arbitrary  subsets  of  K, 
p,  .  e  a  implies  p.  .  fl  Hia)  =  0 

1  J  L  J 


whenct 


similarly, 


p^  *•'  ?  implies  £  H(8) 


hence, 


p .  .  e  a  U  8 
13 


implies 


c  H(a)  U 


or 


Pi,  =  HfayTTHTg) 

whence, 


U  [pL^!pLj  e  a  U  0}  £  H(a)  Pi  H(p) 
or 

H (a  U  8)  a  K (a)  P)  H(3) 


However,  by  Theorem  4-13 


so  that 

whi^h,  with 

If 

as  give""  by 


H (a)  2  H(a  U  8) 

H(8)  2  H(a  u  P) 

H (a)  P  h(8)  2  h (a  U  8) 

the  previous  inequality,  shows 
H (a)  P  H(p)  =  H (a  U  8) 

H (a)  =  H(8 )  =  A 
the  hypothesis 


H(P) 
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A  =  H { a )  =  H(a)  fl  H<0 )  =  H(a  U  @) 

Since  the  set  of  all  subsets  of  K  is  finite,  the  set 
of  all  subsets  a  of  K  such  that  H(a)  =  A  is  a  finite  class  of 
sets.  If  M (A)  is  the  union  of  all  subsets  a  of  K  such  that 
H(a)  =  A,  then  this  M(A)  is  an  unique  set  such  that 
H(M(A))  “  A.  In  the  following  theorem,  discussion  will  be 
limited  to  those  subsets  of  K  which  are  M (A)  for  some  simple 
concept  A. 

Theorem  4.15;  If  «  and  B  are  simple  concepts  then 
A  c  B  implies  M(B)  £  M(A) 

Proof: 


A  £  B 


Hence , 


A  n  B  ■  A 


Since, 


A  *  H(M(A) )  and  B  =  H (M (B) ) 

H{M(A)  U  M(B)  )  =  A  fl  B  -  A 
as  indicated  in  proof  of  Theorem  4.14. 

But  by  definition  of  the  function  M 

H(M(A)  U  M(B) )  «  A  implies  M(A)  U  M (B)  £  M(A) 

whence , 


M(B)  £  M(A) 

Given  any  simple  concept  A,  M (A)  can  be  found  effect¬ 
ively.  It  will  bo  shown  in  the  next  chapter  on  concept  learning 
that  a  rather  straight-forward  algorithm  exists  which  can  find 
the  value  of  M  for  the  smallest  simple  concept  containing  a 
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given  set  of  exemplars.  James  Snediker  has  written  a  program 

based  on  'indek  ,echt' s  v'ork  which  learns  superconcepts  of 

concepts  from  examples  and  stores  them  as  values  of  the  M 
34 

function.  For  the  purposes  of  this  chapter;  it  will  be 
assumed  that  the  descriptions  of  simple  concepts  A  are  stored 
as  the  lists  M(A) .  According  to  the  last  two  theorems  given 
two  concepts  A  and  B,  one  can  fird  by  merely  comparing  the 
lists  m(A)  and  M(B),  whether  A  is  a  subset  of  B  or  not. 

If  now  one  has  the  values  of  M{AS)  and  MfA5)  stored 
in  a  corat  ier  memory,  one  can  deduce  if  a  certain  object  X  is 
a  member  of  A  in  an  approximate  manner.  If  x4(X)  ?  M(A  )  then 
X  A  .  However,  X,  being  an  object,  is  either  contained  in 
A3  or  is  disjoint  from  it,  according  to  Theorem  4.1.  Hence, 

X  A5  c  A.  If,  on  the  other  hand,  M(X)  M(AS)  then  X  f  A3.. 

c*  g 

In  this  rase,  X  £  h"  ,  and  since  A  c  A  ,  X  A.  If  neither  of 
these  cases  hold,  then  no  conclusion  can  be  drawn  regarding 
the  inclusion  of  X  in  A. 

This  approximate  procedure  is  an  analog  of  the 
Pennypacker  recognition  procedure  described  in  the  previous 
section.  However,  t  is  done  by  very  simple  programs  based  on 
very  simple  data  structures. 

If  one  can  find  simple  methods  for  calculating  the 
M-functions  {a}  whose  corresponding  simple  concepts  {H(a)j 
yield  a  given  concept  by  union,  then  the  above  approximate 
recognition  method  can  be  improved  quite  easily.  in  that  case, 
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the  problem  reduces  once  more  to  assuring  oneself  that  the 
number  of  r imp le  concepts  needed  to  describe  a  concept  be  not 
excessively  large  -  even  though  one  already  has  the  assurance 
that  i*-  would  not  be  larger  than  he  number  of  conjunctive 
concepts  needed  to  describe  a  concept. 

Before  this  last  aspect  of  efficiency  of  description 
is  discussed  (and  this  will  be  discussed  in  a  more  generalized 
case  in  the  next  section)  one  must  also  point  out  that  for  the 
list  comparison  algorithm  to  be  effective  the  description  M  for 
the  concepts  involved  has  to  contain  every  property  in  the 
environment,  not  merely  a  fine  structure  family  of  properties. 
Hence,  it  is  required  to  introduce  some  further  algorithms  for 
deducing  property- values  from  the  input  properties  as  was  done 
in  the  previous  section,  This  would  require  the  storing  of 
certain  property-values  as  described  concepts.  Efficient 
methods  for  this  have  no*-  been  developed  yet. 


6. 


Syntactic 


A  Generalized  Description  L anguagc : 

Axiomatisations 

In  this  section  the  ideas  introduced  previously  will 
be  generalized  and  given  a  syntactic  form  similar  to  that  of  a 
formal  logic.  This  will  enable  generalizations  to  description 
languages  of  greater  flexibility  and  descriptive  strength  than 
are  presently  available.  Several  stages  of  development  of  such 
a  language  will  then  be  exhibited  and  their  use  exemplified. 
Indications  will  be  given  later  of  how  far  these  uses  have 
been  implemented  on  a  computer. 

It  probably  need  not  be  established  in  any  great 
detail  to  any  discerning  reader  that  what  have  been  called 
conceptions  and  subsets  of  K  in  the  previous  sections  are  merely 
different  ways  of  putting  together  atomic  formulas  to  yield 
statement  forms.  These  statement  forms  are  characterized  by 
the  fact  that  they  have  only  a  single  free  variable.  The 
component  predicate  letters  are  all  unary  (have  only  one 
argument)  and  give  rise  to  predicates  with  one  free  variable. 
When  these  are  put  together  to  form  statements,  they  all  have 
the  same  free  variable  so  that  the  resulting  statement  form 
has  o r-»  free  variable  only. 

What  have  been  called  objects  in  the  previous  sections 
are  also  examples  uf.  compound  statement  forms  with  a  single  free 
variable.  These  statements  with  a  single  free  variable  define 
sets  of  elements  for  which  the  statements  are  true.  On  the 


basis  of  this,  tests  have  been  described  so  far  which  test  the 
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inclusion  of  one  set  in  another.  However,  it  nas  been  tacitly 
implied  that  these  tests  will  be  used  more  often  (and  in  the 
case  of  * ^nr ypacker  and  the  author  j  work,  exclusively)  when 
the  included  set  is  denoted  by  an  object. 

In  the  cases  treated  so  far,  where  the  environments 
have  been  finite,  one  could  construct  objects  such  that  the 
name  of  every  input  property  occured  in  it.  As  a  result,  the 


sets  indicated  in  Theorem  4.1  could  actually  be  denoted  uy  a 
finite  statement  corresponding  to  an  object.  Since  two  elements 
in  such  a  s*  t  cannot  be  di  uinguished  by  any  concept,  one  might 
consider  each  object  as  denoting  a  single  element.  As  a  result, 
the  objects  themselves  could  be  considered  as  the  elements  of 
the  Universe.  This  point  of  view  will  be  persued  in  the  rest 
of  this  section  even  though  some  of  the  discussions  of  this 
paragraph  are  invalid  in  cases  where  the  Universe  is  not  finite. 


When  an  object  k  is  an  element  of  the  value  p. .  of 

^  J 

the  property  P^  ,  this  fact  can  be  expresseu  by  a  statement  of 


the  form  k  i  P^-j*  This  would  necessitate  that  values  of  dif¬ 
ferent  properties  to  have  different  names.  However,  there  a*-e 


certain  advantages  to  using  the  same  symbols  for  the  names  of 


values  of  different  properties.  Some  of  these  advantages  were 
indicated  in  Section  3  of  Chap  jr  1.  If  such  similarities  of 


names  are  allowed,  then  a  statement  of  the  form  k  e  p.  .  becomes 

il 

ambiguous  since  and  may  be  the  same  symbol.  It  is 

me  5  advantageous  to  express  the  fact  that  the  object  k  is  an 


element  of  the  value 


of  property  P.  by  a  statement  of  the 

i. 


form  P .  (k)  “*  p .  . . 
l  ii 
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In  the  past,  names  of  concepts  have  been  attached  to 
the  conceptions  or  the  lists  M(A) .  In  effect,  the  descriptions 
have  stood  for  statements  S(x)  with  a  single  free  variable  x, 
and  the  names  C  of  the  concepts  attached  to  the  descriptions 
have  indicated  in  effect  that  an  object  k  is  an  element  of  the 
concept  C  if  and  on-y  if  the  sentence  S(k)  is  true;  in  effect, 
these  have  stood  ror  statements  of  the  form  x  e  C  =  S (x) . 

So  far  it  has  been  assumed  that  names  of  properties, 
their  values  and  concepts  are  symbols.  However,  it  has 
already  been  indicated  in  Chapter  I  that  various  concepts  can 
be  given  short  descriptions  if  one  allows  the  processing  of 
the  names  of  properties  and  ''allies.  It.  will,  therefore,  be  of 
advantage  if  these  are  allowed  to  be  objects,  so  that  set 
theoretical  processes  can  be  carried  out  on  them.  This  would 
allow  the  same  programs  that  process  objects  to  process  the 
names  also  and  a  large  amount  of  descriptive  power  would  be 
obtained  witnout  vitiating  the  flexibility  of  the  processing 
and  without  increasing  undr1-'  the  size  of  the  program;  which 
do  the  processing. 

One,  of  course,  must  take  cognizance  of  the  fact  that 
the  set  tneoretical  processes  discussed  so  far  are  merely 
capable  of  working  on  sets  defined  by  unary  predicates,  while 
most  of  the  time  the  processing  of  names  that  goes  on  in  pattern 
recognition  activities  involves  the  calculation  of  functions 
and  the  ascertainment  of  relations.  This,  however,  presents 
no  problems,  since  functions  and  relations,  being  sets  of 
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ordered  r-tuples,  may  themselves  be  considered  to  be  concepts 
in  an  Universe  of  ordered  n~tuples,  Also,  n-tuples  have  the 
obvious  r.  properties  defined  by  each  cf  their  components.  This 
fact  will  be  made  use  of  repeatedly  in  the  examples  that  follow. 

It  may  be  pointed  out,  of  course,  that  this  po.ln+  of  view 
indicates  tvnt  some  objects  under  discussion  will  be  constructed 
out  of  property  names  which  are  entir*  *y  different  f^om  the 
property  names  used  for  constructing  other  objects.  This  may 
be  looked  upon  as  indicating  the  existence  of  a  set  of  environ¬ 
ments  rather  than  a  single  one.  Alternatively,  one  may  con¬ 
sider  that  each  object  is  constructed  out  of  only  a  subset  of 
properties.  This  certainly  would  perclude  the  objects  from 
being  unit  sets.  Since  such  a  possibility  has  to  be  admitted 
in  any  infinite  environment  (and,  as  will  be  seen  presently, 
this  is  a  very  natural  requirement),  one  need  not  disregard 
this  latter  interpretation  of  an  object.  There  is,  mcrb-  r, 
a  philosophic  justification  to  it  when  one  considers  that 
the  description  of  an  "object'1  (in  common  parlance)  often 
depends  on  the  context.  When  one  talks  about  a  person,  for 
instance,  he  may  be  talking  about  every  appearenee  of  the 
person  on  every  occasion  ("He  expresses  himself  well")  or  of 
a  specific  appearenee  of  a  specific  trait  ("He  was  angry  to-day"). 
When  one  a<sys,  "The  letter  X"  he  may  be  talking  of  a  class  of 
letters  or  a  single  mark  on  paper  viewed  from  five  directions 
or  the  same  single  mark  seen  in  a  certain  illumination  a 


certain  time. 


in  any  case,  the  complete  set  theoretical  interpreta¬ 
tion  of  the  syntactic  structure  has  not  been  investigated  yrat. 
Especially  in  its  fully  developed  form,  discussed  at  the  end 
of  this  section,  this  lack  of  interpretation  raises  several 
questions  regarding  the  formal  properties  of  the  logic  involved. 
For  the  present,  attention  will  be  directed  towards  the 
syntactic  properties  of  the  language  only  and  its  interpretation 
will  be  taken  informally  to  be  as  motivated  by  the  discussion 
above . 

The  major  syntactic  features  of  the  language  have 
already  been  discussed.  Initially,  one  assumes  ->  set  of 
symbols,  which  will  be  taken  to  be  countably  ir  finite.  To  give 
syntactic  meaning  to  such  a  set,  they  will  be  identified  with 
words  on  a  fi  ite  alphabet.  The  set  of  symbols,  together  with 
the  special  symbols  ",  s,  .  ,  a,  and  -  and  a  specific 

string  IN  (standing  for  a  single  variable  "Input" )  will  con¬ 
struct  all  valid  phrases  of  the  language.  The  most  obvious 
interpretations  will  be  or  the  set  of  ail  ''objects"  as  defined 
before.  In  wnaf  follows  the  basic  definitions  of  the  syntactic 
entities  will  be  given  and  later  exemplified  by  some  examples 
tc  bring  out  :he  power  of  the  language  under  some  interpreta¬ 
tions  . 

1.  A  finite  sequence  of  1 owe r  case  l at  in  letters 
is  a  symbo 1 . 

2.  A  ymbe  1  is  a  term. 

1.  1 1  a  is  a  term  and  f  is  a  term,  then  n, f  is  an 


i 

I 

I 

I 

s  orderad  pair.  a  is  the  left  hand  element  of  th 

! 

ordered  pair  and  g  is  the  right  hand  element  of 
[  the  ordered  pair  a.  8. 

4.  An  ordered  pair  is  an  ordered  pair  string;  if 
a  and  8  are  ordered  pair  strings,  then  a?3  is 
an  ordered  pair  string. 

5,  If  a  is  an  ordered  pair  string  then  (a)  is  an 


object. 

6.  An  object  is  a  terra. 

An  example  of  the  syntactic  appearence  of  an  object 
may  be  worth  giving  here 

(name,  harry;  house,  (number,  five;  street,  iuther)) 


denotes  (under  interpretation)  a  person  called  >iarry  whose 
house  is  distinguished  from  others  by  an  address.  The  value 
of  the  property  "house"  itself  is  an  object  here. 

7.  The  string  IK  is  a  term. 

8.  if  a  is  a  term  and  8  is  a  term  then  a (8)  is  a 
term. 

9.  If  a  is  a  term  and  8  is  a  term,  then  (o~g)  is 
a  statement . 

10.  If  a  is  a  statement  and  8  is  a  statement,  then 

■'<i,  (avg),  (a&B ) ,  (a.^8)  are  statements. 

11.  If  a  is  a  arm  and  3  is  a  symbol,  then  (aeg) 
is  a  statement. 

If  c  is  a  symbol  and  8  is  a  statement,  then 
IN£  clb 3  is  a  concept,  a  is  the  name  of  the 


12 


concept  and  5  the  intention  of  the  concept. 

It  is  worth  remarking  at  this  point  that  the  syntactic 
entity  "concept,  "  as  defined  here  is  at  variance  with  the 
meaning  attached  to  the  word  in  previous  discussion  .  In  the 
parlance  of  the  previous  discussion.-  a  concept  would  be  the 
set  of  all  elements  of  the  Universe  which  satisfies  the  inten¬ 
tion  of  a  syntactic  entity,  "concept."  The  intention  is  what 
has  previously  been  alluded  to  as  the  "description"  of  the 
concept . 

Among  the  set  of  statements  defined  above  a  subset 
will  now  be  defined  to  be  the  se„.  of  "true  statements."  For 
this  seme  auxilliary  definitions  are  needed.  This  involves  a 
mapping  (called  "value")  from  a  subset  of  the  set  of  all  terms 
and  ordered  pair  strings  into  the  set  of  all  terms  and  ordered 
pair  strings,  defined  as  follows. 

13.  The  value  of  a  symbol  is  itself. 

14.  The  value  of  IN  is  not  defined. 

15.  The  value  of  a  term  of  the  form  a (3)  is  defined 
if  and  only  if  the  values  of  a  and  p  are  defined, 
the  value  of  p  is  an  object  and  the  value  of  a 

is  the  left  hand  element  of  some  unique  ordered 
paij.  in  the  value  of  p.  In  uhis  case,  the 
value  of  a(p)  is  the  right,  hand  element  of  the 
ordered  pair  of  which  the  value  of  a  is  the 
left  hand  element. 

As  examples,  the  value  of  color ( (shape,  square; 
color,  blue))  is  blue  while  the  values  of  color ( (color,  red; 
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color,  blue}}  and  color  ^  (shape,  square;  size,  big))  is 
undefined. 

16.  The  value  of  an  ordered  pair  a,  6  is  defined  if 
and  only  if  the  valuo~  /"r  ;  and  £  are  defined. 

In  that  case  its  value  is  a' where  a'  and 
3!  are  the  values  of  a  and  3  respectively. 

17.  The  value  of  an  ordered  pair  string  a; 6  is 
defined  if  and  only  if  the  values  of  a  and  3 
are  defined.  In  that  case  its  value  is  a';3‘ 
where  a!  and  8’  are  the  values  of  a  and  8 
respectively. 

18.  The  value  of  (a)  is  defined  if  and  only  if  the 
value  of  u  is  defined.  In  that  case  the  value 
of  (a)  is  (a1 )  where  a'  is  the  value  of  a. 

19.  An  object  is  an  exemplar  if  its  value  is  itself. 

20.  Two  symbols  are  identical  if  they  constitute 
the  same  string  of  characters. 

21.  Two  ordered  pairs  are  identical  if  their  left 
hand  elements  and  their  right  hand  elements 
are  identical. 

22.  Two  exemplars  are  identical  if  each  ordered 
pair  of  one  is  identical  to  some  ordered  pair 
of  the  other  and  vice  versa. 

23.  A  statement  (a»8)  is  true  if  and  only  if  the 
values  of  a  and  3  are  defined  and  th*i  value  of 
a  is  identical  to  the  value  of  0. 
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24.  Given  a  set  D  of  concepts  a  statement  is  D--true 
if  and  only  if  it  is  true  or  if  the  statement 
is  of  the  form  (asB),  3  i-  the  name  of  some  con¬ 
cept  K  in  D  and  the  statement  obtained  by  re- 

1  placing  every  occurence  cf  IN  in  the  intention 

of  K  by  a,  a  D-true  statement  results. 

25.  (av8)  is  D-true  if  and  only  if  either  a  or  8  is 
D-true;  (a&3)  is  D-true  if  and  only  if  both  a 
and  3  is  D-true.  (r--?)  is  D-true  :f  and  only  if 
('~a'v3 )  is  D-true;  ~a  is  D-true  if  and  only  if  a 
is  not  true  and  does  not  contain  the  term  IN. 

It  may  be  worthwhile  at  this  point  to  exemplify  the 
utility  of  this  system  in  terms  of  the  example  used  at  the  end 
of  Section  3  of  Chapter  I.  Let  D  consist  of  the  single  concept 
INsas  ( (head  (borders  (IN) }  =  t)  &  ( (head  (crosses  (IN) )  --  f) 

v  ( (tail  (borders  (IN) )  =  t.)  &  (tail  (crosses  (IN) )  =  f ) ) ) ) . 
Then  the  statement  ( (crosses, (head, f ;tail, t) ; borders (head, t, tail, 
f ) ) ea)  is  a  D-true  statement.  This  can  be  seen  as  follows. 

Since  the  statement  is  of  the  form  ac 3 ,  one  obtains  by  definition 
24  above  that  the  statement  is  true  if  and  only  if  the  statement 
obtained  replacing  all  occurences  of  XN  in  the  statement  to 
the  right  of  s  in  the  concept  above  by  (crosses,  (head, f ; tail, t)  ; 
borders,  (head,  t; tail,  f) )  is  D-true.  This  statement  is  of  the 
form  a&B.  The  right  hand  conjunct  of  this  statement  is 
(head  (border  ( (crosses,  (head,  f  ;tail,  t)  ;  borders,  (head,  t.;taii,  f ) ) ) )  = 
t) .  By  definition  23  this  is  true  if  the  values  of  the  terms 
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on  the  left  and  right  of  the  =  sign  Le  identical.  The  value 
of  th®  terra  t  is  t  oy  definition  13.  The  value  of  the  left 
hand  term  is  defined  by  rule  15  to  be  the  value  of  he  ad  ((he  ad.. 
t;tail?£))  which  is  t  again.  So  one  of  the  conjuncts  of  the 
intention  of  the  concept  named  "a"  is  true.  The  other  conjunct 
i®  of  the  form  aV$  and  is  true  by  rule  25  if  either  of  the 
two  d’  -juncts  is  true.  The  first  disjunct  is 

(head (crosses ( (crosses,  (head, f  ; tail, t) i 
borders, (head, t; tail, f) )) )  -  f) 

which  is  again  true  by  definitions  23  and  15  and  13.  One  dis¬ 
junct  being  true,  the  statement  is  true. 

Before  exhibiting  by  some  more  examples  the  extent 
of  the  power  of  the  language,  it  is  worthwhile  to  point  out 
that  any  statement  in  this  language  which  does  not  contain  IN 
can  be  tested  for  truth  by  carrying  out  an  algorithm  on  the 
statement  which  is  closely  related  to  the  definitions  1-25 
above.  This  algorithm  is  shown  in  Figure  4.3  in  flow  chart 
form. 


In  indicating  the  flow  chart  it  has  been  assumed 

that  the  tests  indicated  in  the  control  boxes  of  the  flow 

chart  can  indeed  be  performed.  The  assumption  can  be  justified 

on  the  basis  of  what  is  known  about  syntax-directed  parsing 
35 

to-day.  The  point  will  not  be  belaboured  here. 

bince  the  programs  are  recursive,  some  of  the 


variables  used  are  actually  stacks.  Tr  distinguish  them  from 
other  variables,  their  names  have  been  written  in  upper  case 
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letters  while  other  variables  are  named  in  lower  case  letters. 

As  before  names  are  written  in  quotes  and  their  content  without 
quotes . 

It  might  have  been  useful  at  this  point  to  include 
a  proof  that  the  algorithm  exhibited  in  Figure  4.3  doe:, 
terminate  for  every  statement  not  containing  IN.  However,  the 
proof  is  simple  and  the  reader  may  be  left  to  convince  himself 
of  the  fact.  However,  the  fact  that  the  algorithm  does  terminate 
leads  to  the  useful  tact  that  all  statements  in  this  language 
can  be  recognized  as  true  or  false.  The  language  is  "complete" 
in  that  sense.  It  is  also  "consistent"  in  the  sense  that  all 
statements  are  not  true.  Also,  it  is  "decidable"  in  the  sense 
that  given  a  statement  it  can  be  decided  with  a  finite  number 
of  operations  whether  it  is  true  or  not.  However,  these 
assertions  have  very  little  significance  in  the  mathematical 
sense,  since  what  has  been  described  above  is  not,  in  a  strict 
sense,  a  logical  theory.  Later  on  in  this  section,  in  the 
interest  of  greater  strength,  the  language  will  be  extended 
into  a  logic.  Meanwhile,  the  following  examples  will  show 
that  the  language,  even  in  its  present  form,  has  considerable 
s  treng th . 

The  first  example  indicates  how  representations  of 
integers  and  operations  on  integers  can  be  described  within 
the  machinery  of  the  language  describ  ’  so  far.  It  will  be 
considered  that  the  integers  are  expressed  by  binary  numerals. 
Each  numeral  will  be  considered  to  have  two  properties,  " head" 
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and  "tail."  The  values  of  "tail"  are  "  and  "1,"  and  stand 
Tor  the  ’east  significant  digit  of  the  numeral.  The  values 
of  "head"  are  either  "null"  or  an  integer,  representing  the 
more  significant  digits  of  th '■  numeral.  To  make  sure  that 
confusions  do  not  result  from  leading  zeros,  they  will  be 
disallowed  in  the  description.  In  what  follows  a  set  of  con¬ 
cepts  will  be  introduced  vhich  will  define  positive  integers, 
the  relation  of  natural  ordering  among  integers  and  sums  of 
integers.  From  these.,  the  reefer  will  convince  himself,  the 
arithmetic  of  positive  integers  can  be  defined.  Zero  and 
negative  integers  can  also  be  defined  with  some  work. 

INedigit® ( (IN  =  0)  v  (IN  =  1}} 

INC  lea sd= (( first ( IN)  *  1)  &  (second(IN)  =  0)) 

IN£  numeras  { (  (head  ( IN)  null)  &  (tail  (IN)  -  1))  v  ((head  (IN) 
e  numer)  &  (tail (IN)  c  digit)/) 

INe lesss ( ( (head ( f irst ( IN) )  ~  null)  &  -(head (second (IN) ) 

»  null)))  v  ((first,!:  'ad  (first  (IN)  )  r  second,  head 
( second ( IN) ) )  c  less) 

v  ( (head ( f irst ( iN) )  =  head (second  (IN) ) ) 

&  ( (first,  tail (first (IN) ) ; second, tail (second (IN) ) ) 
c  lessd ! ) 

INesumd-2  {  (second  (IN)  31  0)  &  (first  (IN)  -  third  (IN) )  ) 

v  ((second  (IN)  31  i)  &  -(first  (IN)  =  uhird(IN))) 

INC  carry* ( ( f irst (IN)  =1)  &  (second ( IN)  -  1)) 

INc  sum® ((  x  first,  tail ( f irst ( IN) ) ; second , ta i 1 (second  ( IN) ) ; 

third,  tail  (third  (IN)  )  )  c  sumd/  6,  (  (- (  {  f  irst,  ta  il 


,!*ff 'tysfg  'pji&Qp*-  -my.i- mme*-*' 
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{ f irst { IN) ) ; second, tail (second (IN) ) )  £  carry)  & 

({  first,  head  ( f  i:*st  (IN)  )  ;  second,  head  (second  (IN)  )  ; 
third, head (third (IN) ) )  a  sum)) 

v  ((  (first,  tail(firstflN) )  ; second, tail (second ( IN) ) ) 

£  carry)  ^  (( first, head { f irst (IN) ); second, head 
(second (IN) )• third, head (third (IN) )  e  ripple)))) 

■  (( firs'*- (IN)  =  null)  &  (second  (IN)  -  third  (IN))) 

v  ((second(iN)  ~  null)  &  (first(IN)  =  third(IN)))) 
INeripplecar-=  (  (first  (IN)  =  1)  v  (second  (IN)  =  1') 

IN£  r  ipple-  (~  ( (  first ,  tai  1  ( f  i  rst  ( IN)  )  ;  ~  acond,  tail,  (second  ( IN)  )  ; 

third,  tai  1  (third  (IN)  )  )  sum)  &  (  (~(  (first,  tail 
( first (IN) ) ; second, tail (second (IN) ) )  e  ripplecar) 

&  (  (first, head (first ( IN ) ) ; second, head (second ( IN) )  ; 
third, head (tnird (IN) ) )  c  sum))  \  ((( f irst, tail ( first 
(IN) ) ?  second, tail (second (IN) ) )  c  ripplecar)  & 

(  ( f irst, head (first (IN) )  ; second , head ( second ( IN ) )  - 
third,  head  (third  (IN)  ) )  •'  ripple))))  '«  ((first  (IN) 

-  null)  &  (  (first, second (IN) ; second,  (head, null; 
tail,  1)  ;  third,  third(IN) )  sum))  •  ( (second(IN) 

-  null)  &  ( (  f  irst, f  irst ( IN) ; second,  (head, null ; 

tail,  1 )  :  thi rd ,  t h i rd  (IN)  )  s ura }  ) 

Some  explanation  is  probably  recess--'*  f  r  the  last 


three  concepts .  The  elements  of  "sum."  are  ordered  triples  of 


numerals  such  that  the  third  is  the  sum.  of  the  first  two  m 
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essentially  says,  " Tne  tail  of  the  third  is  the  sun  of  the 
tails  of  the  first  and  second.  If  there  is  no  carry  then  rhe 
head  of  the  third  is  the  sum  of  the  heads  of  the  first  and 
second.  If  there  is  a  carry,  then  the  heads  of  the  first, 
second  and  third  are  related  by  ripple.”  The  description  of 
ripple  is  the  same  as  the  description  of  sum,  except  for  the 
addition  of  a  bit  in  the  lea.  *-  significant  digit  which  is 
allowed  to  "ripple  through." 

An  example,  representing  the  binary  sum  1  +  *i  =  100, 
will  probably  clarify  matters  further.  The  elem  it  of  sum  of 
concern  here  is 

(first,  (head,  nulljtail,  1)  ;  second,,  (head,  (head,  null?  tail,  1)  ; 
tail,  1) ; third,  (head (head (head, mil)  ,  tail,  1)  ? tail,  0) ;  tail.  0) ) 

Initially,  tail (first (IN) )  ~  3 ; tail (second (IN)  =  1; 
and  tail (third (IN) )  =  0,  satisfying  the  first  conjunct  of  the 
first  disjunct  in  the  intention  of  th  concept  named  "sum." 
Hence,  since  (first, 1 ; second, 1)  is  an  element  of  "carry"  the 
object  (firrt, nvnl ? second, (head, null, tail, 1) ? third, (head, (head, 
null?taii, a) ; tail, 0) )  is  to  be  a  member  of  "ripple"  by  the 
®econd  disjunct  of  the  second  conjunct  of  the  first  disjunct 
in  intention  of  "sum."  Since  "first (IN)”  for  this  new  object 
is  "null"  the  third  disjunct  of  "ripple"  has  to  be  satisfied, 
that  is  the  objc:.,;  (first ,  (head,  null ;  tail,  1 );  second  (be^d,  null 
tail, 1) ;thi.  d, (head (head, null; tail, 1) ;tail, 0) )  has  to  belong  to 
"sum."  Agj;in,  the  tails  satisfy  the  first  conjunct  in  the 


first  disjunct.  Also,  there  is  a  carry  so  that  the  object 
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(first,  null ;  second ;  null ;  third,  (head.  "--’I ;  tail,  1) )  must  be  a 
member  of  ripple.  Hence,  again  by  the  third  disjunct  of 
ripple  { first,  null  ?  second,  (head,  null ;  tail,  1)  ;  thi*.  i  {head,  null  ; 
tail,!)}  must  belong  to  sum.  By  third  disjunct  of  sum  one  must 
have  ( (head, null ; tail, 1)  =  (head, null; tail, 1) )  which  is  true. 

One  might  object  to  the  rather  cumbersome  nature  of 
the  concepts.  However,  any  statement  describina  a  complicated 
operation  like  arithmetic  sum  is  bound  to  be  somewhat  cumber¬ 
some.  The  present  statements  are  certainly  less  cumbersome 
than,  say  the  Boolean  expression  describing  a  parallel  thirty- 
six  b’t  adder  and  yet  is  expressing  operations  on  strings  of 
arbitrary  length. 

However,  the  expression  (head,  (head,  (head,  null?  tail,  .1)  ; 
tail, 1) ; tail, 0)  is  certainly  a  more  cumbersome  expression  than 
100  or  even  (x  =  1)  &  (y  =  0)  &  (z  =  0).  Later  on  in  this 
section  methods  will  be  considered  which  will  reduce  the 
unwieldyness  of  objects  in  the  language  and  will  also  enable 
the  attachment  of  names  to  objects.  This  way,  it  will  be 
easier  to  express  operations  by  meais  other  then  through 
relations. 

The  importance  of  the  last  sentence  above  becomes 
clear  when  one  wants  to  express  facts  like  1+1+1=11  and 
11+101=100+1G0,  Unless  some  concept  other  than  sum  is  to  be 
introduced  anew  (a  wasteful  procedure),  one  has  to  introduce 
existential  and  universal  quantifiers  into  the  language  so  the 
above  facts  can  be  expressed  respectively  by  saying  "for  any 
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z  such  that  1+1  **  z  it  is  true  that  z+1  ~  11”  and  "for  any  z 
auch  that  11+101  -  z  it  is  true  that  100+.100  =  z„"  This, 
of  course,  renders  the  recognition  process  of  Figure  4.3 
inadequate.  Before  these  facts  are  discussed,  one  more  example 
will  be  given  w^ich  will  bring  out  some  further  strengths  and 
weaknesses  of  the  language. 

One  can  imagine  classifying  the  resider' i  of  a  street 
by  their  name,  their  house  of  residence,  their  age  range 
(small,  big)  and  sex.  The  house  of  residence  may  be  described 
by  their  size,  color  and  level  of  beauty  (and  perhaps  even 
number,  which  would  render  the  environment  for  houses  non-full, 
which  it  is  anyway) .  A  typical  person  might  be  an  object  like 
(name,  lucy;  age,  small;  house,  (size,  small;  color,  white; 
look,  pretty) ;  sex,  girl) 

In  such  an  Universe,  a  relation  like  fatherhood  can 
be  expressed  as  follows 

IKfe  father^  ( (house  (first  (IN) )  “  house  (second  (IN) ) )  &  (a^e(fvrst 

(IN))  58  big)  &  (sex  (first  (IN)  5  =man)) 
that  is  "of  two  people  in  the  same  house,  the  adult  male  is 
the  other  ones  father."  The  description  is  certainly  incomplete, 
but  can  be  improved  upon. 

The  difficulty  in  the  way  here  again  is  ones  inabi  Hty 
tj  make  sue''  simple  statements  ao  "Harry  is  Susan's  father." 

One  can  try  to  get  around  this  by  including  the  father's  name 
in  the  object,  but  then  one  has  to  make  a  decision  on  whether 
to  include  the  father's  name  only  or  to  include  the  entire 
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object  describing  the  bather.  The  second  gives  rise  to  an 
infinite  recursion:  the  first  leads  to  the  obvious  problem 
of  finding  the  father's  father.  Any  cross-indexing  needs  the 
attachment  of  names  to  objr  'ts  as  are  needed  in  the  case  of 
numerals . 

These  and  related  difficulties  can  be  resolved  (as 
far  as  the  descriptive  strength  of  the  language  is  concerned.) 
by  introducing  variables  other  than  IN  into  the  language  and 
allowing  logical  quantifiers.  Also,,  capabilities  have  to  be 
established  for  naming  objects  by  strings  of  symbols.  However,, 
strings  of  symbols,  unlike  symbols,  should  be  processabie.  To 
enable  this,  a  new  syntactic  entity  will  be  introduced.  In 
this  new  notation,  the  object  (head,  (head (head, null ; tail, 1)  ; 
tail,  0)  tail,  0)  could  have  the  representative  STRING 
(1, 0, 0, numer )  and  if  (first, a; second, 3)  was  an  element  of 
“fatherhood,"  then  a  could  be  represented  by  STRING ( father  - 
of, 3).  Such  naming  processes,  of  course,  should  be  describable 
within  the  language.  Also,  one  should  have  the  freedom  of 
introducing  axioms  in  D  other  than  concepts.  To  do  this,  the 
symbol  whir'  so  far  had  no  logical  significance,  has  to  be 

a  part  of  the  theory.  The  concept  of  "proof"  has  to  be  intro¬ 
duced  as  in  any  logic.  This  renders  recognition  of  objects 
as  belonging  to  concepts  more  difficult.  However,  Miliiken 
has  shown  that  a  suitable  modification  of  the  algorithm  shown 

in  Figure  4.3  can  be  made  winch  enables  recognition  of  some 

4  3 

concepts  even  in  this  extended  language.  Since  the  extended 
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language  enables  its  tawn  description  and  can  describe  integers,, 

it  is  clear  that  a  Mechanical  recognition  procedure  for  all 

44 

objects  is  impossible. 

In  what  follows,  the  extended  language  will  be  intro¬ 
duced  and  exemplified,  A  large  part  of  the  language  will  be 
similar  to  the  one  discussed  before, 
a)  The  Syntax 

1.  Any  string  of  lower  case  latin  letters  and  arable 
numerals  is  a  symbol.  A  symbol  is  a  jUrm  Any 
string  of  greek  letters  is  a  variable.  A 
variable  is  a  term. 

2.  If  A  and  B  are  terms,  then  A, B  is  an  ordered 
pair.  An  ordered  pair  is  an  ordered  pair  string. 
If  A  and  B  are  ordered  pair  strings,  then  A ; B 

is  an  ordered  pair  string.  If  A  is  an  ordered 
pair  string,  then  (A)  is  an  object.  An  ob  ,._  ,:t 
is  a  term . 

The  important  thing  added  te  t..a  syntax  at  this  point 
lo  the  variable.  The  dir-erning  reader  probably  noticed  before 
this  that  IN  was  playing  a  role  similar  to  a  variable  in  the 
previous  discussion.  However,  a  larger  repertoire  of  variables 
are  necessary  for  full  flexibility  of  use.  Going  on  with  the 
syntax: 

3.  If  A  is  a  term  and  B  is  a  term,  then  A (B)  is  a 
term. 

The  term 


color ( a) 
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stands  for  the  English  phrase  "the  cole.:  of  <  "  Generally 
such  a  term  is  meaningful  only  when  a  stands  for  some  object 
like 

(color,  red;  size,  big;  number,  135); 
in  this  case  color (a)  would  stand  for 

red. 

However,  this  interpretation,  unlike  the  previous  case,  is  part 
of  our  axiom  set  now. 

4.  If  A  and  B  are  terms,  Uien  (A=B)  a^d  (AeB’  are 

statements.  If  A  and  B  are  statements,  then 
(AVB),  (A&B) ,  (A-B) ,  (A=B)  and  -A  are  statements . 

5.  if  A  is  a  statement  and  B  is  a  variable,  then 

( V B ) A  and  are  statements . 

Rule  5,  above,  is  one  of  the  major  reasons  for  intro 
ducing  variables  as  parts  of  the  syntax.  Also,  the  use  of  = 
now  has  a  logical  interpretation  as  a  propositional  connective, 
rather  than  merely  as  a  cue  for  recognition  as  ir  had  been  in 
the  previous  discussion. 

It  has  already  been  pointed  out  before  that  in  this 
description  language  one  hus  the  freedom  of  giving  names  tc 
sets  of  objects  and  using  these  names  to  define  new  sets  of 
objects.  However,  these  names  are  arbitrarily  given  symbols 
and  did  not  have  any  syntactic  relationship  to  the  set  of 
objects  being  defined.  Hence,  if  one  had  to  define  a  class  of 
sets  which  had  similar  structures,  this  similarity  would  not 
be  reflected  in  the  given  names.  Thus,  the  set  of  all  numbers 
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greater  than  3  and  the  set  of  all  numbers  greater  than  50  would 
be  given  two  different  names  and  the  fact  that  each  set  has  a 
lower  bound  would  be  lost.  The  reader  is  to  recall  that-  calling 
them  things  like  "greater  than  3"  doet  not  help,  since  the  lan 
guage  deals  with  symbols  as  a  single  entity. 

A  part  of  what  follows  is  directed  towards  giving  a 
number  of  string  processing  abilities  to  any  automaton  using 
the  language  and  for  using  these  abilities  to  tie  in  the  names 
of  sets  and  objects  wiHi  their  structure.  However,  in  line 
with  previous  procedures,  the  mapping  which  define  the  process 
will  be  ncluded  only  in  the  axioms  of  the  system.  What  follows 
then  is  only  the  syntactical  part. 

6.  A  symbol  vs  a  train.  If  A  and  B  are  trains, 
then  A, B  is  a  train.  If  A  is  a  train,  then 
STRING (A)  is  a  string.  A  string  is  a  train. 

7.  A  term  it-  an  operand  If  A  and  B  are  operands, 
then  A, B  is  an  operand ,  If  A  is  an  operand, 
then  TIE (A)  is  a  representative .  A  string  is 

a  representative.  A  representative  is  an 
operand. 

8.  EMPTY  is  an  operand.  If  A  is  a  representative, 
then  STRIP (A),  END (A)  and  REST (A)  are  operands. 

9.  If  m  is  a  representative,  then  REPINV(A)  is  a 
term.  If  A  i*>  a  term,  tuen  REP  (A)  is  a 
representative . 


Examples  of  springs  and  representatives  are 
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STRING  (2,0,  1  num) 

STRING  (STRING  (1,.  0,  num)  ,  STRING(2,  1, num), sum) 

TIE (STRIP (REP (a) ) ,  end  (g)) 

TIE  (brother, of, name (first (a) ) ) 

10.  If  A  and  B  are  representatives,  then  (A-B)  is 
a  statement. 

The  following  examples  will  indicate  to  the  reader 
the  usefullness  of  strings  xn  obviating  the  difficulties 
mentioned  earlier.  Although  the  concept  of  ''truth”  has  not 
been  introduced  in  this  formalism,  the  reader  should  be  able 
to  follow  the  examples  from  an  intuitive  understanding  of  *-he 
meaning  of  truth. 

Let  there  be  a  concept  in  D  as  given  before 
ae  father'-  (age  (first  (a) )  -  big)  &  (sex  (first  (a) ) )  =  man) 

&  (house  (first  (a) )  =  house  (second (a) ) ) 

A  typ  :al  object  in  "f  ther"  might  be 

(first,  (name, trank ; sex, man ; house,  (size, small; 
color, blue ; look, pretty) age,  big) ; second,  (name 
susan ; sex, girl ; house, (size, small ; color . b lue ; 
look,  pretty,)  ;  age,  small ) )  . 

One  can  call  Frank  "Susan's  father"  -  a  rather 
generalized  naming  operation  which  was  imoossible  in  the 
language  so  far.  For  this,  one  can  now  define  an  axiom,  as, 
f o ' lows 

ae  father- (REP { f irst (a) )  =  TIE ( father , of , name (second (a) ) ) 
which,  by  the  rules  described  later,  would  yield  the  statement. 


1 : 


;  : 

i ,  .  .a- 


v-. 


h'; 

i  ?  '  ;  -• 

V’  u*.  V 


;>• 

•\  1 


L 


-252-  '! 

'] 

{REP (name, frank ; sex, mn ; house  of  residence 
(size,,  small ;  color,  blue  ;  look,  pretty)  •  age,  big)  )  = 

STRING(father, of , susan)  j 

1 

exact  way  in  which  the  truth  of  this  statemw  t  i 

i*  derived  in  the  language  wj 11  oe  shown  after  the  axiom 
system  has  been  discussed, 
b)  The  axiom  schemata 

1.  A  statement  (A^B)  is  an  ax iom  if  end  only  if 

(i)  A  and  B  are  each  tne  same  (identical)  term. 

Two  objects  are  identical  if  every  ordered 
pair  appearing  in  A  is  identical  to  some 
ordered  pair  appearing  in  B  and  vice  versa. 

Two  ordered  pairs  are  identical  if  their 
first  elements  are  identical  and  their 


5 

? 


f 


i 

! 


\ 

l 


second  elements  are  identical. 

(ii)  A  and  B  are  identical  trains. 

(iii)  If  A  is  a  term  of  the  form  C (D)  where  D  is 
an  object,  C  the  first  element  of  some 
uniaue  ordered  pair  of  D  and  B  is  the 
second  element  of  the  same  ordered  pair. 

{ iv)  If  A  is  of  the  form  TIE(C)  or  TIE (C, EMPTY) 
or  TIE (EMPTY, C) ,  C  is  a  train  and  B  is  the 
string  STRING (C). 

(v)  If  A  is  of  the  form  END (C)  where  C  is  a 


string  of  the  form.  STRING  l  D,  B )  (alternatively 
STRING (D, A))  and  B  is  either  a  symbol  or  a 


str  ng. 
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(vi)  If  A  is  of  the  form  REST(C)  and  C  is  a 

string  of  the  form  STRING (B,Dj  where  D  is 
a  symbol  or  a  string,  or  if  C  is  a  symbol 
and  A  is  EMPTY" 

(vii)  If  A  is  of  the  form  STRIP (STRING (B) ) . 

2.  Every  statement  ~(A=B)  is  an  ax iom  if  and  only  if 
(x/  If  A  and  .?  are  symbols  but  not  identical  or 

if  A  (alternatively  8)  is  a  symbol  and  B 
(alternatively  A)  is  an  object. 

(ii)  If  A  and  E  are  both  objects  which  do  not 

contain  terms  of  the  form  C(D)  and  A  and  3 
are  not  identical. 

(111)  If  A  and  B  are  both  trains  and  not  identical. 

3.  Every  suatcment  ~(AaB)  is  an  axiom  if  B  is  an 


If  A  and  B  are  statements  and  X  is 
the  n  t h e  f o 1 1 ow  i  n  g  are  ax  ions 
(i)  iA*(B~A)) 

(ii)  ( ( A  -  (P  -C } )  • ( (A-B )  * ( A  -C }  )  ) 

(ill)  (  (  -A-  ~B)  -  (B-A)  ) 

(iv)  (  (  •  X )  (A-B)-(A-CfX)F)  ) 

(v)  (  ( IX)  A-*u'A/  ) 

*  V 


.  ab  i  e 


r  'r'  ^ 


In  ( iv)  X  must  not  occ\ 
either  a  variable  or  a  term:  however. 

A  must  be  in  a  sub-sta tener.f  B  of  A  in  th¬ 
is  a  free  variable  m  C- 


■  O  ( 


(v)  Y 


:  re nee 
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of  x  in 

lore  X 
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An  occurence  of  a  variable  X  in  a  statement  *  -s 
said  to  be  bound  if  it  occur?  in  some  sub-statement  of  A  in 
the  form  (YX)C.  An  occurence  which  is  not  Loui  i  is  called 

free . 

X 

The  symbol  S  A/  stands  for  the  statement  A'  obtained 
‘  y 

by  replacing  all  free  occurences  of  X  in  A  by  Y. 

5-  If  A  and  B  are  r fatements  and  X  is  a  variable 
then  the  following  are  axioms 

(i)  ((IX)  A  -(YX)~A) 

(ii)  ((AvB)s((A-B)-B)) 

(iii)  ((A&B)s-f  whb)) 

(iv)  ((A*b;  ((A-B)  &  (B-A) ) ) 

It  will  be  noticed  that  some  statements  which  were 
true  in  the  previous  system,  are  now  axioms.  For  instance, 
(color  ( (size,  big; color,  .  .vj) )  -  red)  is  an  axiom.  However, 
since  the  recursive  function  “value"  is  not  defined  in  the 
new  system,  some  other  statements,  like  (color (first ( {first, 


( c  o  ^ or , 

red  ;  a  ire  ,  b  ig'-  ;  sec_.>d,  hand)  ) 

)  =  red)  is  not  an 

3X1 

cm  > 

because 

first (first (color  red ;s  ire 

;  o  19  } 

: second, hand ) 

IS 

not  3  n 

object 

but  a  general  term  which  is 

not. 

covered  by  ru 

le 

(blni) 

above.  The  truth  of  the  above 

statement  wil 

L  on 

iv 

follow 

from,  the  .in  it  ion  of  t'..e 

r  u  1  e  s 

of  inference 

as 

9  1  v  e  n 

below. 

Me  a  nwh i 1 e ,  it  ma y  be  wo r  t n 

wh  lie 

to  point  out 

tha 

although  the  statement  above  is  not  an  axiom,  its  negation  is 
not  one  either.  Truth  and  falsity  o*  statements  are  much  more 
difficult  to  test  m  the  new  system.:  a  price  one  pays  for 


flex ib i 1 ity . 


c)  Rules  of  inference 


—  d  i>  • 


Given  a  set  D  oi  st  dements  and  a  statement  A,  we 

say  A  is  derivabl  from  D  if  there  exists  a  sequence 

S,  ,  £_  of  statement-  such  that  S  is  the  same  as  A  and 

1  i  n.  r 

for  every  i  (1  v  f  s  r,  ,  either  is  an  axiom,  or  S.  is  a 

member  of  D  or  is  inferred  from  previous  statements 

S  ,  S,  (j,k  <  i)  by  one  cf  the  followina  5  rules  of  inference: 

J  K 

(i)  From  A  to  infer  (~'X)  A  where  X  is  a  variable 
(n,  From  A  and  (A -3)  to  infer  B. 

(iii)  From  A  and  (X=Y)  ,  to  infer  A1  where  A 1  is 

obtained  f’-om  A  by  r-.  .  cinq  some  occurence: 
of  X  by  Y  and  some  occurences  of  Y  by  X. 

(iv)  From (REP (X) =Y)  (where  X  is  a  term  and  i  is 
a  replug  native),  to  infer  (REPIHV  (Y)  ~X)  . 
yO  From  (A=5)  to  infer  (B=A)  . 

1 1  may  be  worthwhile  at  this  point  to  pci; 
some  of  the  previously  discussed  statements  are  derivab.lt 
certain  axioms. 


out  how 
■cm 


Le 
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Si nee  both  of  the  above  are  derivable,  one  can  replace  (accord- 
ind  to  rule  (cii;)  above)  " (color, red; size, big) "  in  the  second 
by  "first  { (fir*1':,  (color,  red;  size,  big)  ;  second,  hand) ),  "  the  L.K.S. 
of  the  first  svateirtnt;  deriving  the  initial  statement, 

Again,  one  can  derive  REP (name, frank; sex, man; house  of 
residence,  (sii.%  small; color, blue; look,  pretty)  ;age, big)  - 
STRING (father, of . susan)  from  the  concept  named  "father"  and  the 
statement 

(ae father )-*  (REP (first  (a) )  =  TIE  (father ,  of , 
name (second (a) ) ) ) 

in  a  similar  rnannt. ,  in  view  of  some  of  the  axioms  dincussed 
earlier. 

Before  closing  this  section  it  may  be  worthwhile  to 
point  out  how  the  REP (representation)  of  the  object  named 
"frank"  does  not  have  to  be  unique.  If  locations  of  houses 
in  cities  and  profe,  jions  of  people  were  included  in  the 
Universe,  this  object  might  also  have  STRING (frank, the, barber , 
of, Seville)  as  a  representation  and  one  could  have  statements 
1  ike 

REPINV (STRING (father, of, susan) ) 

“Rt-'INV  (STRING  (frank,  the,  barber,  of,  Seville) )  . 

An  example  from  the  field  of  character  recognition 
may  motivate  some  readers  more.  Ass  me  that  the  Universe  con¬ 
sists  of  the  different  configurations  of  excitations  on  a 
square  array  of  ^hoto-cells  and  suppose  one  is  interested  in 
all  conf igurations  in  which  all  excited  photo-cells  lie  on  a 


i 

l 
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straight  line  inclined  tc  the  horizontal  edge  of  the  photo-cell 
at  -45".  Call  it  "negativediag . " 

In  this  Universe  each  photo-cell  determines  a  property 
whose  values  are  called  0  and  1.  Each  photo-cell  is  determined 
by  its  coordinates.  Hence,  the  'diverse  of  photo-cells  has 
two  properties  corresponding  to  the  X  and  Y  coordinates,  which 
will  be  called  "first"  and  "second"  here.  The  values  of  both 
these  properties  are  integers,  which  have  been  discussed  before 
in  connection  with  the  description  language.  The  reader  will 
'•erify  that  a  typical  configuration  on  a  2x2  array  may  be 
denoted  by  the  object 

((first,  (head, null ; tail, 1} ; second,  (head, null; 
tail, 1) ) , 1; (first,  (head,  (head, null ; tail, 1) ; 
tail, 0} ; second, (head, null ; tail, 1) ) , 0; (first, 
(head-nu..  ;tail,  1)  ;  second,  (head,  (head,  null; 
tail, 1) ; tail, 0) },1; (first,  (head,  (head, null; 
tail, 1) ; tail. 0) ; second, (head, (head, null, tail, 

1 } ; tail, 0) ) , 0) 

representing  the  configuration 

1  0 

1  0 

One  can  now  write  a  statement  which  defines  the  set 
"negativediaq . " 

aenegativediags (30) (ay)  ( (3 (a)  =  1)  &  (y(a)  =  1)  &  ~(3=y)) 

&  (3)3)  (Vy)  ( (  y  (a)  »  1)  -  i(first,first(y); 
second, second (y) ; third, d)  e  sum)) 
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1*he  language  while  describing  concepts,  has  usefulness 
in  other  information  retrieval  systems.  Its  use  in  such 
systems  has  not  been  investigated  but  it  may  be  safe  to  say 
that  its  capability,  even  if  somewhat  curtailed,  may  be  greater 
than  any  conjunctive  system  of  descriptors  or  association 
strength  networks  discussed  in  the  field. 
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7 .  Other  Description  Languages 

Set  theoretical  descriptions  have  been  used  for  con¬ 
cepts  mostly  by  workers  interested  in  simulating  human  cognitive 
activity.  However,  the  entire  bas i c  of  Pattern  Recognition  as 
a  phenomena.,  is  set  theoretical  or  mo,  a  precisely,  logical. 

The  motivation  behind  the  different  methods  used  in  synthesizing 

concept  learning  algorithms  often  lie  in  fields  like  statis- 
36  37 

tics,  or  linear  algebra  ;  however,  in  every  case  the  final 
algorithm  for  recognizing  an  object  as  belonging  to  a  concept 
or  pattern  (after  the  "learning  phase")  can  be  looked  upon  as 
using  a  compound  statement  as  the  description  of  the  concept. 
This  will  be  clear  if  one  considers  the  case  of  a  set  of  binary 
vectors  whose  components  satisfy  a  linear  inequality.  The  set 
( 000,  001,  Oil,  100,  i  ?l,  111}  of  binary  vectors,  for  instance,  can. 
be  represented  by  the  linear  inequality  j  -  y  +  z.  >  0  or  the 
Boolean  Expression  { ~  y  +  2)  or  the  statement  (y  =  0)  v  (z  «  1) . 
Similar  statements  can  be  constructed  for  cases  where  the 
discriminating  functions  are  non- linear  or  even  when  the  com¬ 
ponents  of  the  vectors  come  from  a  continuam.  Each  component 
of  the  vectors  are  properties  whn.se  values  isolate  subsets  of 
the  Universe.  However,  in  this  latter  cas*~  '  ’  icar  ex¬ 

pressions  representing  these  functions  need  quantifiers  to  take 
care  of  infinite  set-theoretic  connectives. 

The  modes  of  combination  available  to  Pattern  Recog¬ 
nition  schemes  based  on  statistics  or  linear  algebra  are 
richer  than  those  available  to  Boolean  Algebra.  However,  very 
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often  the  effectiveness  of  the  various  modes  of  combination 
dealt  with  in  literature  are  strongly  dependent  on  the  3  itiai 
measurements  (i.e.,  the  input  properties  or  "features”)  -  and 
dependent  in  an  extremely  il ’.-understood  way;  also,  there  is 
no  uniform  method  for  changing  one  set  of  algebraic  operations 
into  another  to  yield  new  “features11  from  old  ones.  It  is  to 
achieve  such  flexibility  and  to  tie  down  the  description  with 
the  basic  set  theoretical  structure  of  the  problem  that  the 
language  of  Section  >  was  developed. 

Very  little  can  be  sc  id  regarding  the  ultimate 
effectiveness  of  the  various  algebraic  or  statistically  orientea 
languages  available  for  description  of  patterns.  Some  of  them 
(like  linear  separation)  essentially  restrict  the  capability  of 
description  for  the  sake  of  simplicity  of  description  and 

3  8 

"train5  ng. “  Others,  like  Braverman's  potential  functions, 
are  essentially  "op  .  nded”  a,. a  can  be  used  (like  Boolean 
functions)  to  describe  any  concept  whatever.  However,  these 
latter  lead  to  problems  of  confidence  limits  and  "generalization. 
It  may  be  well  to  defer  discussions  of  these  to  the  next  chapter, 
when  learning  is  discussed. 

Returning  to  the  discussion  of  the  use  of  simple 
Boolean  expressions  (or  expressions  in  Propositional  Calculus) 
as  a  description  language,  it  has  been  shown  in  Section  3  how 
the  class  of  describable  concepts  can  be  restricted  for  parsimony 
yielding,  say,  the  cla^s  of  conjunctive  and  simple  concepts. 

P 1  though  the  class  of  simple  concepts  properly  contains  the 
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class  of  conjunctive  concepts,  all  concepts  are  not  simple  and 
modes  of  description  have  to  be  available  for  describing  every 
concept.  Th°  language  c  the  property  lists  is  not  adequate 
for  this.  A  suitable  extension  of  it  has  been  suggested  which 
is  capable  of  describing  any  concept.  Conceptions  can  describe 
any  concept  also.  The  efficiency  cf  both  are  severly  restricted 
for  a  large  cla^s  of  concepts.  However,  the  ultimate  capability 
for  description  is  not  limited,  as  they  are  for  perception-like 

devices,  which  use  hyperplanes  as  discriminating  surfaces. 

39 

Two  other  languages.  The  CLS  by  Hunt,  and  £,PAM  by 
40 

Feigenbaum,  aj.e  restricted  m  their  ability  to  the  same  extent 
as  the  Conceptions.  The  relationship  between  the  two  have  been 
discussed  by  Hunt.  It  is  relevant  to  discuss  here  the  salient 
points  of  difference  between  the  Conception  on  the  one  hand 
and  the  CLS  and  the  EPAM  on  the  other. 

The  face  that  the  systems  used  by  Hunt  and  Feigenbaum 
are  binary  trees  while  the  Conception  allows  more  than  two 
branches  to  emenate  from  the  nodes  is  not  a  crucial  difference. 
Ail  three  are  essentially  r  '■e  structures  -  'he  fact  that  only 
one  of  the  trees  is  non-binary  is  easily  attributable  to  the 
strong  influence  that  the  word  "bit"  has  had  on  psychologists 
since  1948. 

There  are  iwo  crucial  differences  between  the  CLS 
and  EPAM  tree  and  the  Conception,  however.  One  lies  in  the 
fact  that  trie  name  of  the  concept  described  by  the  tree  is 
placed  at  the  root  of  the  tree  in  the  Conception  w'hile  it  is 
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placed  in  the  Leaves  in  *-he  otner  two  languages.  This  looks 
like  an  essentially  wasteful  feature  of  the  Conception,  since 
there  has  to  be  a  different  tree  for  every  concept.  However, 
there  are  some  essential  reasons  for  doing  this,  as  an 
analysis  of  the  basic  sets  involved  will  show. 

For  one  thing,  a  property  which  is  relevant  to  a 
concept  A  may  not  be  relevant  to  a  concept  B.  This  fact  can 
be  used  advantageously  in  the  Conception.  But  when  the  EPAM 
processor,  say,  is  testing  an  object  for  membership  of  B,  it 
still  has  to  go  through  a  test  for  this  nun-relevant  property, 
just  because  it  was  worth  testing  in  testing  for  A. 

There  is  another,  much  stronger  reason  for  attaching 
concept  names  to  roots  of  trees.  Very  often  the  same  concept 
turns  out  to  be  sab-concepts  of  two  different  concepts,  in  the 
sense  that  there  is  a  concept  C,  two  property  values  o  £  P  and 
q  e  0  and  two  concepts  A  and  B  such  that  C  -  p  A  =  q  "  B. 
Then,  the  name  C  can  be  placed  on  the  conceptions  of  A  and  B 
instead  of  placing  the  entire  C  tree  twice  in  the  conceptions 
of  A  and  B  as  would  be  necessary  m  the  CLS  or  the  EPAM. 

(These  remarks  do  not  pertain  to  CEP AM  developed  by  Ernst  and 
Sherman,  which  will  bo  discussed  later. ) 

These  differences  occur  essentially  since  it  is  over¬ 
looked  in  the  other  languages  that  a  specific  object  can  be  a 
member  of  more  than  C'e  concept  -  hence,  one  has  to  attach  more 
than  one  name  to  every  leaf  if  the  name  of  tne  concepts  are  to 
be  attached  to  leaves.  Also,  some  of  the  intermediate  nodes  of 


*>63- 


a  tree  may  contain  enough  information  to  identify  an  object 
in  a  concept  while  to  recognize  the  same  object  in  one  01  its 
sub-concept''  would  necessitate  going  deeper  into  the  tree™ 

The  need  for  attaching  concept  names  to  nodes  becomes 
even  more  clear  whan  one  needs  to  define  a  new  test  ir  terms  of 
old  tests  in  the  interest  of  simplicity  cf  description.  In  the 
Conception,  it  is  a  matter  of  adding  a  new  list  into  the  Con¬ 


ception  of  the  Universe  -  while  it  is  impossible  m  the  other 
two  structures.  No  fie-  .ole  language  has  appeared  in  the 
Pattern  Recognition  or  cognitive  process  research  as  a  counter¬ 
part  of  the  description  language  discussed  in  Section  b. 


Various  languages,  like  the  languages  developed  by 
Naras imhani,  ^ '  and  by  Kirsch,  carries  out  operations  on  names 
of  properties,  but  the  procedures  not  have  the  same  flexi¬ 
bility  and  uniformity.  However,  the  "IR  System  of  Raphael  has 

certain  similarities  with  the  language  of  Section  6  which 

4  c 

o u q h t  t o  u e  d i c c ussed. 

One  of  the  greatest  similarities  between  SIR  and  the 
language  described  in  Section  6  are  the  similarities  in  the 
structure  or  objects.  The  same  property -value -pairs  are 
stringed  together,  and  values  of  properties  may  be  objects 
themse  1 1  _.s ;  it  is  not  clear  whether  names  of  properties  can  be 
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of  course,  is  that  the  object,.  ;  n  a  sense,  has  greater 
efficiency  of  processing.  Fox  instance,  let  the  question  be 
asked  "Is  Tom  Harry's  brother?"  ( i .  e . ,  let  tom  e  brother 
(harry)  b.  posed  as  a  theorem) .  A  special  processor  could 
answer  this  as  ,!yer‘. "  However ,  if  the  question  is  asked  "Does 
Harry  have  a  brother  aged  20?",  the  processor  will  have  tc 
know  that  there  are  objects  whose  property  "name"  have  values 
"tom,"  "dick"  and  "don":  a  fact  which  is  not  clear  from  the 
format  and  a  separate  processor  would  be  needed  to  incorporate 
such  extra  assumptions. 

In  fact,  SIR  as  originally  conceived  and  implemented 
consisted  mostly  as  a  series  of  processors  capable  of  handling 
a  special  class  of  objects.  The  syntactic  restrictions  the 
objects  were  to  satisfy  were  defined  more  in  terms  of  the 
structures  of  the  processors.  As  a  result,  certain  facts  about 
objects  were  eat-'  to  describe  while  others  were  impossible, 
unlike  the  language  described  in  Section  6,  where  certain  parts 
are  easy  to  describe  and  others  merely  more  difficult  to  de¬ 
scribe.  Also,  in  the  language  of:  Section  f- ,  facts  which  were 
originally  difficult  to  describe  can  be  made  ease  to  uescrioe  b\ 
adding  new  concepts.  Most  present,  day  description  languages 

lack  this  flexibility  through  expansion. 
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as  GIR  I'  is  a  Fi’-st  Order  Theory  in  the  sense  of  Symbolic 

Logic,  Hence,  the  testing  of  the  truth  of  certain  statements 

will ,  in  most  general  cases,  turn  out  to  be  a  search  for 

appropriate  steps  of  the  proof.  This  is  at  present  extremely 
4 1  t  2 

difficult.  1  On  the  other  hand,  many  theorems  in  the 

system  can  be  proved  oy  simple  processes,  as  was  shown  in  the 
first  pav"t  of  Section  6.  Even  the  addition  of  certain  flexi- 
b  ilities  of  the  language  appearing  in  the  second  part  of 
Section  6  does  not  vitiate  the  facility. 

The  present  chapter  has  discussed  the  role  p'ayed  by 
languages  in  th_  description  of  patterns.  It  has  negl  cted 
■.  e.e  discussion  of  languages  where  the  basic  predicates  involve 
arithmetic  operations;  although  it  is  clear  both  from 
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CHAPTER  V 


LEARNING  AND  GENERALIZATION 


1 .  Introduction 

In  Chap car  I/,  the  major  concern  was  to  develop 
languages  i..  which  se^s  of  objects  could  be  described  in  such 
a  way  that  a  specific  object  could  be  tested  for  membership 
in  a  set  in  terms  of  the  object's  properties.  Associated 
with  any  technique  of  concept  learning  -  whether  it  be  by 
discriminant  functions.  probability  estimation,  and  such 
like  “  has  to  have  a  language  in  which  expressions  can  be 
written  to  define  a  set.  The  major  points  of  difference 
between  the  ones  described  in  Chapter  IV  and  the  more  popular 
ones  in  the  field  lie  in  the  following  way.  Initially,  the 
languages  described  in  Chapter  IV  ar°  essentially  non-numer ical, 
so  that,  the  objects  do  not  have  to  be  pre-processed  to  yield 
numerical  values  of  the  properties.  In  the  field  of  Pattern 
Recognition,  this  pre-processing  is  essentially  a  phenomenon 
left  out  of  the  learning  and  recognition  techniques  analysed. 

It  is  the  belief  of  the  author  that  this  separa  ion  of  pre¬ 
processing  from  recognition  introduces  great  dif  t  rcuities  in 
the  way  of  answering  the  most  important  question  in  the  field 
to-dav,  "How  does  one  determine  the  most  effect  i ve  pre- 
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non- incut  proper ties  stand  f_r  the  results  oF  pre-process inc 
the  input  properties.  Thus  the  pre-processing  is  described 


in  the  same  format  as  tne  pat 


terns  are  described.  Thus,  the 


"effectiveness"  of  pre-processors  can  be  discussed  m  an 
uniform  way.  It  was  ina  icatc-i  while  discussing  the  language 
-.f  Section  4.6  that  tnis  does  not  necessarily  remove  the 
advantages  inherent  in  numerical  processing. 
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description  learned  in.  tne  central  case  ceases  to  be  -  met. 
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one  can  diecuss  the  modification  of  P  in  an  environment  to 
reader  descriptions  succincc. 


2 


.  Learning  Conjunctive  Concepts 

The  aicsorithm  described  in  this  section  was  developed 
by  Pennvpacker  for  developing  conceptions  (see  Chapter  IV. 

Section  j)  for  patterns  on  the  basis  of  examples  of  objects 

32  . 

belonging  to  the  patterns.  Unlike  most  experiments  conducted 

in  the  field  (except  those  conducted  with  psychological  interest 

by  Bruner  and  his  followers,  '  ,  the  algorithm  will  be 

given  the  freedom  of  choosing  examples  on  the  basis  of  past 

examples  shown  by  the  trainer;  also,  it  will  fee  given  the 

freedom  of  asking  two  other  questions,  "Is  the  pattern  described 

by  the  following  conception  completely  contained  in  the  pattern 


under  consideration?",  and  "Is  this  a  correct  conception  for 

the  pattern  under  consideration?"  Both  these  questions  can  be 

replaced  by  statistical  ♦-.ests  and  this  latter  may  be  necessary 

in  real  circumstances.  However,  the  purpose  of  the  investigation 

was  to  establish  the  logical  structure  of  the  ^gorithm. 

The  basis  of  the  algorithm  lies  in  the  isolation  of 

a  set  of  prooerty  values  G  ■=  l Pi  •  ,  p0-  ,  .  .  . ,  p  .  }  such  that 

12  s 


p1  .  i1  pv  n  ...  n  p  .  X  where  X  is  the  pattern  being 
12  s 

learned  and  such  that  there  is  no  subset  of  G  whose  inter¬ 
section  is  contained  in  X.  If  X  is  a  conjunctive  pattern, 
this  basic  algorithm  converges  to  yield  a  short  conception  for 
X.  Otherwise  jfc  yields  a  conjunctive  pattern  which  is  a  proper 
subset  of  X.  Examples  outside  the  pattern  and  inside  X  are 
then  interrogated  to  yield  other  conjunctive  patterns.  The 
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obtain  conceptions  of  Boolean  functions  of  patterns  having 
known  conceptions  and  testing  for  identity  of  concepts  and 


Proof;  Assume  to  the  contrary;  then 
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process  continues  till  a.  set  cf  conjunctive  patterns  are 
obtained  whose  union  covers  X. 

The  operation  of  the  algorithm  needs  the  ability  to 
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containment  of  one  concept  in  another.  It  also  needs  the 
capability  of  evaluating  properties  of  objects,  given  the 
values  of  its  input  properties.  Algorithms  for  doing  these 
have  been  developed  by  Pennypackers  some  of  these  have  been 
briefly  discussed  in  the  previous  chapter. 


The  operation  of  the  algorithm  depends  on  the  folio;* 
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Lemma  5.1:  Let  <(J,  P>  be  an  environment  and  let 
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?  n  p2.  n  ...  n  p  ,  =  pk  ,  n  pk  .  n  ...  n  P  . 
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X, ■.  ,<//  - 

TC^Sf, 

mkm< 


Iff-.''  fy 


and  p0.  n  ...  Op  .  c  p,,  ,  n  ...  n  p. 


Vi 


Then  for  nc  r(l  <  r  s  s',  pk  •  = 

r^r  11 


k  j 

s  '’s 


«i 


271- 


P- 


.  -  f  p  .  =  P ,  ■ 

ni  ii, 

n  1 


f  1  p  p,  ^  f  i  » •  •  i  i  p 


m 


t  n 

contrary  to  hypothesis. 

Lemma  5.2;  Let  <U, P>  be  an  environment  and 


{ P.  ,,  P0  , ,  P  }  c  p.  Let 
i  /.  n 


^  p,  .  Op,,  n  ...  n  p.  . 

L  »  _  z.  1  _  i  A  3 


Mi.  '  t'2i. 

i  i 


Dp,,  n  ...  Dp 


n 


kl3l 


k  i 
s  •  s 


where  for  each  r(i^r^s),  and  for  all  in. 


P  .  €  P  (X  s  m  s»  n) «  Then  for  each  r(i 

miu  m  ‘ 
m 


s} 


-*r  1k 


Proof:  Let  k  ~  t.  Then 


•pli. 


2i. 


Op-  »  p.  .  =  p. 

tx.  Ht  i. 
n  t  r  k 


and 


P-  i  Hp,  ■ 
4 V  1  1.  J 


l'  l 


. . .  n  p,  cd. 

l  k  i  k  '! 


The  theorem  follows  since  P,  is  a  partition  .cd 

k  r 

r 

f :  "-k  ,,  f  *■ 
r  k 

r 

Lemma  5.3;  Under  the  hypothesis  of  Lemma  i.2  if 

P2i  0  •**  0  pni  *  Pk  i  M  •**  0  pk„] 

2  n  x  1  s  b 


then  for  some  r(l  r  *  s) 


k  =  1 
r 


Proof : 


1  i  k  <  n  for  all  r  l  .1  £  r  s  s) 
r 
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and 

k  /I  for  all  r(l  £  i  ^  s) 
r 

indicate* 

2  a  k  &  n  for  all  p 
■>»  *■ 

contradicting  hypo the?’ " 

Lemma  5 „ 4 :  ?Jnder  the  hypothesis  of  Lemma  5,2  let 

Pli,  n  •••  n  Pni  Pk,j,  0  •••  0  j 
1  n  1J1  sJs 

and  le*-  fp,  .  ,  p,, ,  .  . .  p,  .  }  be  the  stt  of  all  terms  on  the 

U1  iX2  txt  ~ 

left  hand  side  such  that  p  .  f  p,  for  any  r(l  ^  r  ^  s). 

m  r  Jr 

'Chen, 


P 


(t+i)it+1  n  •••  0  -°r.in  +  Pli,  0  •••  0  Pnin- 


Proof:  By  construction  of  the  set  [p. .  .  }, 

~  ill  tLt 

the  left  hand  side  is  equal  to  p,  H  ...  0  p,  ■  ,  which 

K111  ks3s 

directly  contradicts  the  hypothesis. 

The  algorithm  can  now  be  justified  rigorously.  Any 

object  is  an  intersection  of  a  set  [p. .  }  of  property  values. 

i 

If  this  object  is  properly  contained  in  a  conjunctive  concept 

which  is  to  be  learned  then  the  hypothesis  of  Lemma  5.2  is 

fulfilled.  If  now  one  removes  from  the  set  fp. .  ]  one  value 

13  . 

1 

p.  .  then  one  of  four  things  may  occur 
1o-io 


(i)  The  new  concept  obtained  by  intersecting  the 


'>l:  - 


elements  of  fp. .  j  -  p.  is  the 
il-  i _  3 _ 


same  as  trie 


oo 


object. 

(i.i)  The  new  concept  as  obtained  above  contains  the 
object  properly  and  is  properly  contained  in 
the  conjunctive  concept  being  learned.  This 
fulfills  the  condition  of  Lem?  a  5.1  and  he  nce, 
does  not  occur  in  the  expression  for  the 


P 


Vo 


conjunctive  concept  to  bn  learned  and  hence, 

can  be  removed  from  the  set  [p  .  }  without 

1 J  i 

violating  the  hypothesis  of  Lemma  5.2. 

(iii)  The  new  concept  as  obtained  in  (i)  above  coin¬ 
cides  with  the  concept  to  be  learned.  This 
terminates  the  learning  process. 

(iv)  The  new  concept  is  not  contained  in  the  concept 

being  learned.  Then  Lemma  5.3  holds  arid  p.  . 

c>  o 

occurs  in  the  expression  for  ''he  conjunctive 
concept  being  learned. 

If  case  ( Li)  holds,  one  can  start  the  process  over 
again  by  removing  a  new  property  value  p . ,  .  ,  from  the  set 

-i-  i 

c J  o 

is  left  in  the  set 


pe 


]  -  p 

1 

i  j  .  * 
o  o 

If  (iv) 

holds,  then  p 

‘  i  1 
o  o 

f  and 

never 

removed 

again.  In  either 

r termed 

on  the 

set  [p.. 

}  is  constrained 
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a  smaller  set.  Hence,  if  on  each  removal  of  a  .  only  (ii), 

o  ^  o 

(iii)  or  (iv)  holds  the  successive  removal  comes  to  an  end.  If 
i *■  does  not  come  to  an  end  at  (iii),  then  the  concept  to  be 
lea.  aea  is  not  co  jvncuive. 

It  is  to  be  no+ed  that  the  above  discussion  provides 
tne  rationale  for  Bruner's  conservative  focussing  strategy. 

Case  (i)  never  occurs  in  his  experiments  since  his  input 
properties  are  all  the  properties  xn  the  envi'cnr  nt  and  the 
input  properties  form  a  full  fine  structure  family  (see 
Chaptt .  -  IV).  In  the  present  case,  where  the  environment  con¬ 
tains  many  non-input  properties  so  that  the  entire  property  se 
is  not  full,  the  algorithm  needs  the  modification  discussed 
oelow. 

If  (i)  above  is  the  case,  then  other  property  values 

have  to  be  removed  from  the  set  fp. .  } .  If  (i)  holds  for  all 

'  i 

values  so  removed,  then  combination  of  two  values  are  removed 
from  (p.  }  and  the  process  is  repeated.  At  this  point  ful- 


f illment  of  (iv }  does  not  yield  any  result,  since  the  condition 
of  Lemma  5.2  is  not  necessarily  fulfilled  any  more.  However, 
the  fulfillment  of  condition  (ii)  is  still  significant,  since 
the  hypothesis  of  Lemma  5.2  v/as  not  involved  in  the  proof  of 
Lemma  c’  1 . 
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values  whose  removal  wi.il  result  in  fulfillment  of  conditio” 

(ii) „  This  way  all  property  values  not  oc  taring  in  the  ex¬ 
pression  of  the  conjunctive  concept  is  removed  in  a  finite 
number  of  operations  and  condition  (iii)  o  curs.  If,  however 
the  concept  being  learned  is  not  conjunctive  then  condition 
(iv)  occurs  on  t.h*3  removal  oc  any  property  value.  In  this 
case  a  new  object  is  chosen  wnich  is  contained  in  the  concept 
being  learned  but  not  in  the  conjunctive  concepts  learned  so 
far  and  the  process  is  repeated.  Since  m  a  finite  environment 
any  concept  is  the  union  of  a  finite  number  cf  conjunctive 
concepts  {  ee  discussion  following  Theorem  4.12),  the  process 
terminates  with  the  recognition  of  the  concept 

The  above  discussion  constitutes  an  informal  proof 
of  the  following  theorem. 

’theorem  5.5:  In  a  fi.  ite  environment  the  algorithm 
shown  in  flow  chart  of  Figure  5,1  terminates  in  a  finite  number 
of  steps  with  the  recognition  of  a  concept. 

It  may  be  pointed  out  that  the  finiteness  of  the 
algorithm  does  not  m  any  way  assure  a  short  description  of  the 
concept  learned,  because  a  short  description  may  not  exist  at 
all.  Aiso,  the  process,  though  finite,  may  be  inordinately 
long.  This  also  has  some  extremely  adverse  effect  op  the 
"general izat .on"  of  the  concept.  This  latter  will  be  discussed 
in  a  later  section.  In  the  next,  section  an  algorithm  will  be 
discussed  which  teams  simple  concepts  in  a  finite  number  of 
steps.  It  utilizes  the  language  discussed  in  Section  5,  and 


igure 


— 


7  6  - 

hence,  can  construct  simple  descriptions  for  a  much  richer 
class  of  concepts  than  tr  >  '•oniunctive.  However,  ir.  many 
realistic  c.’ses  (for  instance,  in  environments  were  properties 
are  two-valued)  even  this  excludes  many  concepts. 


awns 
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K  *  K 
O 


If  K.  »  [t.  ,  t-  and  X.  =  p,  ■  0  p~. 

1  l  i  m  i  i  i-  ij.-, 


H  »  then  K.  -  It  t  ~  t,  _rom  some  i (1  £  i  ^  m)  and 
‘ni  i+l  ’  i 

n 

t  *  p5  for  any  k  { 1  s  k  &  n ) } , 

'"L\ 

The  fallowing  lemma  is  important. 

Lemma  5.7;  K  *  M(XT  U  •  -  *  U  X.J . 

— — —  pi  p 

Proof;  Let  p  £  R_.  Then  there  is  some 
— - -  ^ms  '  p 

X,  (l  i  t  s  n)  such  that  X^_  £  p 
t  t  ms 

Hut  p  e  M(X^  U  ...  U  X J  implies 

p  c  X,  n  X~  n  * .  X  £  X ,  whic).  eads  to  a  contradiction. 
*ms  12  p  t 

Hence,  p^g  £  M(X^  U  ...  U  X  ) .  Hence,  K  s  M(X,  U  . . . 
or  Kp  =  M(XX  U  ...  Xp). 

Now  let  Xt  «-  pxi  fl  ...  0  pni  .  Then  p,^  £  K 

I  n  ‘  tn  p 

for  any  .a ( 1  s  m  £  n) .  Hence,  Xfc  -  pmn  implies  pmn  /  ^  cr 


K  £  {p  Ix^.  £  p  }  -  !o  (x.  fl  p  =  jZf}  ~  [o  ip 
p  Man’  t  mn  'inn'  t  nu.  -inn'^mn 


c  x  }  - 


M(Xt) 

(Since  ,  being  an  object  is  either  wholly  contained  in  or 
disjoint  front  any  property  value:  see  Theorem  4.1). 


So  K  £  M(X  )  for  all  t  (1  £  t  *  p)  or  K  c  fl  M(Xj 
p  p  t  *  1 


*  M (X^  U  ...  U  Xfc) (Theoi  n  4.14). 

This  with  the  previous  inequality  yields  tine  lemma. 
One  can  thus  construct  the  'rollowA.ng  algorithm  for 


learning  a  concept.  The  algorithm  start-,  with  two  copies  of 

K  and  k/}  in  memory.  Every  time  an  object 

p,.  . . .  p  .  is  presented  as  belonging  to  a  concept  X, 

1.  n 

■s 

the  values  p  .  (1  £  m  s  n)  are  removed  com  Kt  to  yield 

mi  i 

n 

K? , , .  Similarlv,  any  time  the  obiect  is  presented  as  belonging 

—  2  2 
to  X.  K.  is  similarly  modified  to  K. ...  At  any  staae  of 
i  -*  i+l  1 

learning,  if  m  positive  and  n  negative  instances  are  presented, 

1  s  2  — = 

then  H(K  }  =»  X  and  H;K  )  »  X  > 
v  m  '  n 

If  X  and  X  are  both  simple  concepts  then  the 
algorithm  converges  at  some  value  of  m  and  n  such  that  H(K^}  =  X 
and  H(K  }  =  X.  However,  if  either  X  or  X  is  not  simple,  then 

I  i 

one  has  to  remain  satisfied  with  the  approximation  to  X  given 


by  xS  and  XS. 


A  test  has  been  developed  by  windtknecht  and  Snediker 


to  find  out  if  a  concept  and  its  complement  are  both  simple 


33,  34 


The  test  depends  on  the  following  lemma. 

Lemma  5.8;  xS  =  x  ant.  x°  =  x  if  and  only  if 
x  ii  x  ~  0 

Proof :  The  "only  if"  part  is  immediate.  For  the 

"if"  part  one  notes  that  x  J  x  c  xJ  IJ  xS,  But  x  U  x  -  U. 
Hence,  xS  U  xS  £  IJ,  yielding  x3  U  x6  =  U.  Hence,  Xs  -  xS 

from  hypou.iesis  or  x  =  x  .  But  by  Theorem  4.10,  x  a  x"  ~  x" 
and  also,  x  £  xJ  whence  x  =  x°.  x  =  x°  follows  similarly. 


The  application  of  this  test  depends  on  a  test  for 
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1.2. 

H{iy  n  H{Kfc)  being  empty  or,  according  to  Theorem  4.14  for 
1  2 

H(K^  U  K^}  K  0.  The  for  this  is  not  straight  forward; 

given  a  subset  T  £  K,  it  may  be  easy  to  find  out  if 
H(T)  *  0.  Clearly  for  *ny  P  €  P,  H(P)  ■  0.  Also,  if  T'  a  P 
then  H(?^}  £  H(P)  ~  0*  However,  there  may  be  some  T  which 
does  not  contain  any  P  e  P  as  a  subset  and  yet  H(T)  =  0.  The 
set  [Rj  ,  R,  ,  R^  ,i.,}  in  the  example  of  Chapter  IV  is  an  example 
A  rather  involved  procedure  has  been  developed  by 
Snediker  for  the  test.  Rather  than  describing  the  test  in 
detail  here,,  it  may  be  more  wo:  hwhile  to  discuss  the  effective 
ness  of  the  procedures  discussed  here  and  in  the  previous 
sections  in  realistic  situations  and  compare  them  with  other 
well  known  Pattern  Recognition  techniques. 
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4 ,  Problems  of  Learning  and  Feature  Extraction 

A  study  of  either  of  the  methods  of  learning  in  the 
two  previous  sections  is  very  illuminating  in  that  it  brings 
to  the  attention  of  the  reader  some  of  the  major  difficulties 
in  the  way  of  pattern  learning  and  points  out  some  of  the 
important  requirements  for  an  effective  pattern  learning  tech- 
n ique - 

It  will  be  noticed  in  the  case  of  both  the  methods 
chat  they  arc  most  effective  in  learning  patterns  (or  concepts) 
ir*  certain  classes.  The  Pennypacker  technique,  although  an 
effective  procedure  for  all  concepts,  ensures  rapid  convergence 
only  fa  conjunctive  concepts.  The  Windeknecht-Sned iker  tech¬ 
nique  converges  to  the  correct  concept  only  if  the  concept  is 
simple.  7n  the  Pennypacker  technique,  however,  there  is  a 
technique  for  finding  cut  when  a  concept  is  not  conjunctive  arid 
modifying  the  algorithm  to  take  account  of  this  fact-.  In  the 
Windeknecht-Sned iker  Algorithm,  the  corresponding  test  indicates, 
not  whether  the  concept  being  learned  is  simple,  but  whether 
both  it  and  its  complement  is  simple.  No  algorithm  has  been 
developed  which  would  learn  any  concept  as  an  union  of  simple 
concept  in  a  way  analogous  to  the  Pennypacker  algorithm. 

However,  the  Pennypacker  algorithm  takes  certain 
liberties  which  are  not  used  by  any  other  algorithm  known  to 
us.  It  asks  the  experimenter  questions  ..bout  the  inclusion 
relationship  between  the  concept  being  learned  and  concepts 
described  by  the  algorithm.  The  Bruner  conservative  focussing 


-282- 


strategy  on  which  the  Pennypacker  algorithm  is  based,  did  not 
allow  these  liberties,  although  it  did  envisage  questions 
from  the  subject  regarding  memberships  of  specific  objects  in 
the  concept  being  learned.  The  extra  liberties  were 
necessitated  by  the  fact  that  unlike  in  Bruner’s  case  (and  the 
case  of  most  psychological  work  after  him),  it  was  uot  assumed 
that  the  input  properties  of  the  real  environment  is  full. 

This  invalidates  some  of  the  methods  used  in  the  psychological 
experiment::  for  obtaining  new  objects  fr^m  a  focus  object. 

It  is  difficult  to  say  how  many  of  ~ne  advantages 
of  the  Pennypacker  algorithm  over  the  Windeknechc-Snediker 
algorithm  would  remain  if  the  extra  liberties,  were  taken  away. 
Just  as  one  needs  tc  develop  methods  fc-  asking  Bruner-tvpe 
questions  (“membership  rather  tha*.  inclusion")  in  the  environ¬ 
ment  envisaged  by  Pennypacker,  methods  neod  to  be  developed 
also  for  modifying  the  Windeknecht-Snediker  algorithm  to  the 
cases  where  the  concepts  learned  a: e  non-simple.  However,  in 
any  case,  since  there  are  more  simple  concepts  in  an  environ¬ 
ment  than  there  are  conjunctive  ones,  the  Windeknecht-Snediker 
method  ought  to  be  more  effective  in  general.  However,  in 
environments  where  air  properties  have  oniy  two  value.3,  all 
simple  concepts  are  conjunctive.  In  this  ca:n  the  relative 
advantages  d i s  appe a  r . 

The  weaknesses  and  strong  points  of  the  learning 
me the  s  discussed  in  this  chapter  may  be  used  to  develop  a  set 
of  criteria  for  the  evaluation  of  concept  learning  methods  in 


| 
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general.  In  the  previous  paragraphs  the  methods  of  this 
chapter  have  been  discussed  on  the  3gasi-t  of  the  following 


questions. 


1.  How  rich  is  the  class  of  concepts  any  one  of 
whose  members  can  be  learned  by  this  method? 

2.  How  rich  is  the  class  of  concepts  any  one  of 
whose  members  can  be  learned  efficiently  by 
this  method? 

3.  How  rich  is  the  class  of  concepts  whose 
descriptions  are  succinct  when  expressed  in  the 
language  envisage  by  the  algorithm? 

4.  Given  the  interpretation  of  an  environment  as  a 
real  pattern  learning  situation,  how  many 
patterns  to  be  learned  can  be  expected  to  be 
members  of  the  class  described  in  questions  1, 

2  and  3  at^ve? 

It  can  be  seen  that  the  class  described  in  question  1 


above  contains  the  class  described  in  question 


(Nothing 


can  be  learned  efficiently  unless  it  is  learned:.)  In  the  case 
of  many  methods,  this  latter  class  coincides  with  the  class 
described  in  question  3.  However,  ihis  may  not  be  true  for  all 
methods  and  languages-  A  mathematical  study  of  this  point  can 
not  be  attempted  unless  precise  and  acceptable  definitions  of 
the  words,  "succinct"  and  "etiici^nt1'  be  given.  This  will  not 
be  attempted- 

Question  4,  perhaps  needs  some  clarification,  since 


.  '  lN.\  3 
■ox* ; 


m  1 

Nr?  , 


M  ■ 

it 

,| 

"  j'.'S 

dr 


it  refers,  not  to  an  abstract  entity  call'd  the  "environment" 
in  the  discussion,  but  to  the  unformalizable  thing  called 
"real  life,  "  and  the  way  one  abstracts  it  to  an  "environment'’ 
in  the  technics,  sense.  In  question  4,  the  words,  "patterns 
to  be  learned"  refers  to  "real  life, "  as  for  example,  the  class 
of  "all  roman  letters  project' u  on  a  grid  of  photo-cells"  while 
the  "classes  described  in  questions  1,  2  and  3"  refer  to  the 
describable  classes  after  a  class  of  properties  have  been 
abstracted  and  used  in  a  mathematical  system.  If  in  "real  life" 
one  was  called  upon  to  learn  all  concepts  possible  (ror  instance, 
if  learning  the  class  containing  "all  lower  case  "a"s,  upper 
case  ”Q"s  and  all  symmetrical  fig  res"  was  as  necessary  as 
learning  the  class  of  all  upper  case  "S"s)  the  answers  to 
question  4  would  coincide  with  the  answers  to  questions  1,  2 
and  3  respectively.  However,  this  i^  often  net  true. 

It  has  already  been  seen  that  in  the  case  of  the 
Pennypacker  technique  the  class  described  by  question  1  i  ., 

"the  class  of  all  concepts."  The  class  described  by  questions 
2  and  3  is,  "the  class  of  all  conjunctive  concepts,"  Of  course, 
how  rich  this  latter  class  is  depends  on  the  richness  of  the 
family  of  properties  in  the  environment.  This  ^att  r  point  will 
be  discussed  presently.  Meanwhile,  it  may  be  worth  pointing 
out  that  if  one  restricts  oneself  to  a  family  oi.  nput  properties 
(like,  say,  "the  excitation  level  of  each  photo-cell  in  the 
grid")  the  cl  as.,  of  conjunctive  concepts  is  not  very  rich, 
especially  with  respect  to  "real  life." 


;  i 

■'fj 
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It  ought  to  be  pointed  out,  however,  that  the 
flexibility  of  the  languages  described  in  Chapter  IV  =.  such 
that  the  definition  of  new  properties  are  extremely  easy  .o 
incorporate  ini  he  unguage.  This  can  be  done,  moreover, 
with  respect  to  the  class  of  patterns  described  in  question  4- 
No  e  ficient  algorithm  exists  for  introducing  such  new  proper tie 
but  cere  '.n  heuristics  can  be  considered.  This  is  done  below 
with  esDec  t  the  Ui  i  arse  exemplified  in  chapter  IV, 

Section 

Ii  the  jnv?  onment  consists  of  the  two  properties  p 
and  q  th«*n  the  only  onjunctive  concepts  are  the  ten  values  of 
p  and  ^  and  he  twenty-  Lve  objects.  Let  us  rv>w  assume  that, 
the  concept  A  =  {3,4,13,14,15,16,17,18,1,2,  11,12}  has  to  be 
learned.  The  on!  way  the  Pennypa-xei  algor itho  could  learn 
the  concep-  would  be  as  an  union  of  exemplars •  This  would 
render  the  learning  pr  cess  extremely  inefficient,  and  ah  o  the 
conception  of  th>-  pattern  learned  would  be  unwieldy.  The  same 
would  be  the  case  with  respect  to  learning  the  concepts 
B  ~  15,6,7,8,9,10,19,20],  c.  =  {  2  ]  22,26,27]  and 
D  -  {"'3,24,2  5, 28, 29,  30} .  However,  at  this  poi,.t  it  could  be 
realized  (  i  f  we  had  an  algorithm  strong  enougn  to  c'->  it)  that 
A  U  B  (ot  T,  so  far  unkn.  wn )  ,  C  J  D  (or  T0)  and  u  -  T,  -  T-> 
con  Id  be  used  as  ,  pro  >erty  of  the  environment  and  jo  could 
A  C  R.,  a  -  R  and  U  -  R-  -  R , .  This  would  yield 

A  ~  B ,  !  •  T,  and  c  -  R_  ■  i  T0  at  a  cons idorable  increase  in 
sue  inctness  of  description.  However  th is  succinctness  would 


2  rt  6 


be  purchased  at  che  expence  of  stoi.  nq  T.  ,  T,,  ,R,  ,R.  ,  t_  ai.  i 

x  2.  x  s  3 

R-j  U  Rj  in  the  description  list  of  the  Universe.  S"ch  .rooertv 
generation,  then,  can  only  be  justified  they  yield  to  Ln.-t 
descriptions  of  many  concepts  ini-  leads  no  efficient  l,  a>  .n, 
of  concepts  encountered  late  .  Alsc  it  has  profound  sigri^i 
cance  with  respect  to  the  ”<-eneraliz’  ng  oility"  i  the  lea-n^r 
algorithm,  as  will  be  show  in  the  r  >xt  sect i  n. 

The  reader  will  not;?  f  a  t'  con'-eptt  T,  ,  T  ,  ,  Fu 

ana  R.  as  defined  above  led  to  n-  d  •  'options  .  ?  cep  is 

4 

which  would  not  have  beer,  learned  it  examples.  They  w  u..’,d 
have  been  internally  generated  to  * ac  .itate  the  st  . 
concepts  which  have  been  learned  from  examples .  S  oh  .??  tjua 
are  generally  called  "features"  in  the  literature  and  p:c-e,se 
which  isolates  them  are  called  "feature  extraction.  "  in  tn  -= 
book,  the  term,  "concept  formation"  ha*  also  been  ’•‘■on  d  .  r 
this  phenomenon. 

It  will  be  worthwhile  at  this  point  to  cons i  ’  s  mo 

of  the  other  learning  algorithms  in  literature  and  how  id  y 

stand  with  respect  to  the  presently  described  methods. 

Because  of  the  cimilaritj  of  the  description  1 any u age 

the  first  methods  that  come  to  mind  are  the  EP/v,.,  "  and  the 
30 

CSL-I.  The  comparative  advantage  a  of  the  la  gu  •  ies  ha  -e  bee 
discussed  beforehand.  The  major  point  to  be  made  .<ou t  EPAM 
is  really  in  connection  with  the  limitations  the  t.PAM  tree. 
Because  the  name  of  the  concept  occurs  at  t  <?  root  of  the  tree, 
two  non-d is  joint  concepts  can  not  be  very  vc 1 1  described  ,n 


an  EPAM  tre 


- 1  d 


This  puts  essential  restrictions  on  EPAM  as  a 
learning  algorithm;  however,  a  recognizer  using  tne  EPAM  tree 
as  the  description  makes  an  extremely  efficient  property 
eva luator . 

Some  of  the  drawbacks  of  tne  EPAM  net  have  been  re¬ 
moved  by  Ernst  and  Sherman. 51  It  vas  enabled  the  bui lding  of 
a  description  language  incorporating  some  of  the  highly  desirable 
characteristics  of  the  predicate  calculus  language  described  in 
Section  6  of  the  previous  chapter.  Since  the  exact  form  of 
the  Ernst  and  Sherman  language  is  not  completely  formalized,  a 
de  i.  led  description  of  the  language  will  not  be  given  here. 
h  vi  •.  i  ,  t  may  be  worthwhile  to  point  out  a  few  important 
eharac  on  tics. 

The  Un * verse  of  Discourse  of  the  language  has  two 


prope, t  es, 

"Ex'1 

and  "Na." 

The 

values  of  "Ex"  (acronym  for 

-  xemp  1  o  "  ) 

are 

object1'  in 

the  £ 

sense  discussed  in  Sect  ion  G  o 

Charter 

The 

values  of 

"Na" 

are  concept  names.  The  EPAM- 

■  bt  tree  v  escribes  single  concept  consist;:  i  of  exemplar- 

name  pairs  s  ich  the.  a  eh  exemplar  belongs,  to  the  concept 


having  the  pa  red  name.  Thus,  if  an  object  X  belongs  to  both 
the  concepts  A  and  B  then  in  the  Ernst.  -Sherman  language  the 


b l e o  ... ;  (Ex,  X 

: ;  K  a  ,  A ) 

and 

(b'x,  X;  Na, 

B )  w 

ouid  both  belong  to 

1  n  co’V.  opt  de 

scr  •.  •  ,-ed 

by  th 

e  tree* 

Th  1 

other  major  departure  of 

*  .  i  e 

E  r  n  s  t.  —  S  h  o  rnn  ci  n  t.  r  0  0 

rom  E PAT  lie 

in  the 

f  a  c  t 

that  the  t<  . 

st  n,: 

des  of  the  tree  may 

.•■r.l.bn  st atom 

v.  •  s  of 

t  he  f 

orm  "term 

term 

cl  S  \s  0  JL  1  el  S 

mm**™**  t 
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"term  *  term"  while  the  conventional  EPAM  test  nodes  consist 
of  the  latter  types  only.  This  enables  the  language  to  have 
some  of  the  advantages  of  the  language  of  Chapter  IV,  Section  6. 
As  a  result,  the  Ernst-Sherman  learning  algorithm  can  make  use 
of  previously  learned  concepts  in  describing  new  concepts  and 
thus,  shows  an  important  aspect  of  truly  adaptive  behavior. 

Like  most  learning  techniques  based  on  Boolean 
Algebraic  methods  (the  Windekr.echt-Snediker  technique  being  a 
notable  exception),  the  learning  algorithm  is  most  effective 
in  learning  conjunctive  concepts;  however,  like  the  Pennypacker 
algorithm,  it  can  laarn  any  concept.  Moreover,  it  can  learn 
concepts  whose  descriptions  involve  statements  of  the  form 
term  e  term." 

The  CSL-I  technique  of  Hunt,  learning  with  the 
description-tree  developed  by  him,  has  one  capability  which 
the  Pennypacker  algorithm  lacks:  it  can  learn  succinct  de¬ 
scriptions  of  concepts  whose  complements  are  conjunctive.  To 
do  this,  it  has  to  store  in  memory  all  the  objects  shown  in 
the  concept  and  its  complement,  instead  of  modifying  the  de¬ 
scription  with  each  new  presentation  of  an  exemplar  as  the 
Pennypacker  algorithm  and  its  parent,  Bruner's  conservative 
focussing  strategy,  does.  Uniike  the  Pennypacker  method,  there 
is  no  method  in  CSL-I  for  using  non-input  properties  in  the 
description.  As  a  matter  of  fact,  the  advantages  of  having 
non-input  properties  (an  advantage  which  is  used  by  all  human 
beings)  seems  to  have  been  completely  neglected  in  all 
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psycholog ical ly  oriented  structures  of  Pattern  Recognition  that 
the  present  author  has  come  across. 

On  a  superficial  study,  it  might  appear  that  the 
numerical  techniques  of  Pattern  Recognition  discussed  often 
in  literature  are  far  stronger  than  the  ones  discussed  here. 

As  a  matter  of  fact,  there  is  a  tendency  to  include  in  the 
field  of  Pattern  Recognition  only  techniques  based  on  the 
theory  of  vector  spaces  and  probability.  The  importance  of 
the  study  and  development  of  flexible  description  languages 
as  done  here  often  seems  to  do  outside  tne  pale  of  the  field 
of  Pattern  Recognition.  This  is  extremely  hard  to  understand 
’  v  iew  of  the  constant  bemoan ings  in  the  Pattern  Recognition 
field  regarding  the  elusive  nature  of  the  "feature  extraction" 
problem,  which  is  intimately  associated  with  the  basic  predicates 
of  description  languages. 

In  what  follows,  a  short  discussion  will  be  given  of 
the  present  author's  interpretation  of  the  methods  and  results 
in  the  field  of  numerical  Pattern  Recognition. 

As  has  been  pointed  out  before  for  these  methods  to 
be  effective,  one  has  to  have  properties  whose  values  can  be 
represented  as  real  numbers.  Thus,  in  any  environment  with  a 
finite  number  of  input  properties  (and  no  distinction  has  yet 
been  attempted  between  input  and  non  input,  properties),  each 
object  (many  authors  prefer  to  call  the  objects  "patterns"  but 
it  will  be  safer  here  to  hold  to  an  uniform  terminology)  is 
represented  by  a  vector  in  the  space  of  n-tuples  of  reals. 


The  learning  algorithms,  on  the  basis  of  a  list  of  objects, 
tagged  by  their  membership  in  a  given  concept  C,  constructs  a 
real  function  f  of  n  variaoles  (the  "discriminant  function") 
such  that  for  a  large  number  of  objects  x  encountered  or  ex¬ 
pected  to  be  encountered  f(x)  would  be  positive  if  and  only  if 
the  object  belonged  to  C.  In  symbols 

f(x)  >  0  h  x  e  c 

Just  as  in  the  case  of  the  algorithms  described 
previously,  the  form  of  the  function  f  is  restricted  to  a  class, 
at  least  by  the  efficiency  of  recognition.  That  is,  for  some 
of  the  algorithms  the  class  of  concepts  described  in  question  1 
is  restricted  and  in  others  this  specific  class  is  unrestricted 
while  the  cla?s  described  in  questions  2  and  3  are  restricted 
and,  in  most  cases,  identical.  The  class  described  in  Section  4 
can  only  be  considered  on  the  basis  of  experimentation  and  the 
results  of  the  experiments.  Different  methods  have  varied, 
both  with  respect  to  their  quality  of  the  results  and  conclusive¬ 
ness  of  the  experiments. 

Comparisons  of  these  different  learning  techniques 
are  generally  made  on  the  basis  of  the  operational  mode  of  the 
learning  algorithm.  One  criterion  for  this  is  whether  a  tech¬ 
nique  is  adaptive,  i.e.,  whether  the  algorithm  stores  all  the 
tagged  objects  and  constructs  the  function  f  on  the  basis  of 
the  entire  set  of  tagged  objects  or  whether  the  function  f, 
starting  from  an  arbitrary  initial  value,  is  modified  by  each 
tagged  object  in  succession,  so  they  do  not  have  to  be  stored 
for  processing  ' en  masse.' 
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Ancther  important  criterion  for  distinction  may  be 

on  the  basis  of  "motivation,  11  i.e.,  the  basis  of  choosing  the 

class  of  functions  to  be  constructed.  In  some  cases  this 

function  is  generated  on  the  basis  of  the  estimation  of  the 

52  53 

parameters  in  a  set  of  probability  distributions.  '  That 

is,  it  is  assumed  that  the  concept  C  and  its  complement  C  are 
such  that  there  exists  two  distributions  p  and  q,  characterized 
by  a  vector  of  parameters  6  such  that  the  function  f  has  the 
form 

f  (x )  =  -  1 

q  (x ;  6 ) 

for  some  parameter-vector  9.  The  class  f  is  determined  by  the 
forms  of  p  and  q  and  the  allowed  range  of  choice  of  the  vector 
0.  The  forms  for  p  and  q  are  chosen  either  on  the  basis  of  the 
workers  belief  (on  the  basis  of  empirical  data,  hopefully)  that 
the  distributions  p  and  q  are  adequate  or  by  the  fact  that  the 
estimation  of  the  parameters  is  computationally  feasible  for  a 
large  set  of  tagged  objects  if  p  and  q  are  assumed  to  have 
some  given  form.  Unfortunately,  reality  does  not  often  conform 
to  the  conveniences  or  limitations  of  the  theoretician. 

Another  basis  for  the  choice  of  the  class  of  functions 
f  may  be  dictated  by  certain  "distance  functions."''  That  is, 
one  starts  with  the  axiom  that  there  is  a  metric  p  on  the  space 
of  n-tuples  such  that  if  A  is  the  set  of  all  objects  tagged  as 
belonging  to  C  and  B  the  set  of  all  objects  tagged  as  belonging 
to  C,  then  f  is  often  defined  by 
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f(x)  =  rain  { P (x, y) }  -  min  {p(x,y)} 
yeA  yeB 


or 

f (x)  *  F(x,y)  j_  -  F(x,y)  J_ 

veA  yeB 

p  denoting  average  value.  Again,  the  class  f  is  determined  on 
the  basis  of  the  investigator's  choice  of  p  -  hopefully  on  some 
rational  or  empirical  basis.  Often  the  class  of  f  chosen  by 
different  methods  turn  out  to  be  the  same. 

A  large  number  of  authors  restrict  the  class  f  directly 
without  reference  to  any  statistical  or  metric  criteria  -  which 
to  the  present  author's  mind  is  no  less  justifiable  than  choosing 
the  class  on  the  basis  of  the  faiths  discussed  above.  The  most 
popular  form,  of  course,  is  the  linear  one  where 

f(x)  *  a  .  x  +  b 

—  .  54 

where  a  is  a  vector  and  b  a  real  number.  It  appears  that 

although  the  class  of  concepts  descritnble  by  these  linear 

functions  (the  "linearly  separable  patterns")  is  much  richer 

then  the  class  of  functions  discussed  in  the  previous  sections, 

it  is  still  a  very  small  fraction  of  the  class  of  all  possible 

concepts  -  even  where  the  vector  space  under  consideration  is 

the  finite  space  of  all  possible  binary  sequences.  What  is 

worse,  even  the  class  of  concepts  described  by  question  4  turns 

out  to  be  inadequate  on  the  basis  of  experimental  evidence, 

unless  the  set  of  properties  defined  by  the  components  of  the 

vector  is  "adenuately  chosen"  -  and  there  is  no  uniform  way  of 

choosing  the  "adequate"  representation. 


-293- 


The  class  f  has  been  enriched  by  many  workers  by  in¬ 
cluding  non-linear  functions  -  especially  polynomials  of  large 
degree.  The  only  major  difficulty  with  respect  to  such  choice 
lies  with  the  very  large  number  of  coefficients  needed  for  an 
adequate  description  of  the  concepts.  This  difficulty  is 
analogous  to  the  cases  where  the  Pennypacker  technique  learns 
a  concept  as  the  union  of  an  inordinately  large  number  of 
conjunctive  concepts.  Another  analogous  case  arises  where  the 
description  iu  taken  to  be  "piecewise  linear,"  i.e.,  where  the 
description  of  C  takes  the  form 

x  e  C  s  B((fL(x)  >  o)  ,  (f2(x)  >  o),  ...,  (f  (x)  >  o)) 

where  B  is  a  logical  combination  of  the  statements 
{(f^x)  >  o|  1  £  i  s  p}  and  f^x)  is  a  linear  function.  A  sub¬ 
class  of  the  class  of  functions  so  describable  are  those  de- 
scribable  by  the  so  called  "two  layer  nets,"  i.a.»  where  B 
yields  a  linearly  separable  function,  so  chat  the  statement 
above  can  be  rewritten 

P 

x  e  C  s  2  ai  sgn  [ f ± (x) ]  +  >  o 

i  =  1 

where  sgn[t]  =  1  if  t  >  o  and  o  otherwise.  "Multi-layer  nets" 
can  be  similarly  constructed,  yielding  richer  classes  of  de¬ 
scribable  concepts. 

When  one  considers  adaptive  techniques  for  the 
evaluation  of  coefficients  appearing  in  the  representations  in 
any  of  the  scheme^  described  above  an  important  question  arises 
regarding  the  "convergence"  of  the  training  scheme.  Convergence 
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proofs  are  known  only  for  soma  of  the  algorithms  based  on 

53 

statistical  estimation,  as  also  algorithms  on  the  basis  of 

55 

linear  separability.  The  earlier  algorithms  were  proven  to 

converge  only  in  cases  where  the  tagged  objects  were  from  a 

linearly  separable  concept  and  its  compliment.  Algorithms  have 

been  suggested  recently  where  the  algorithms  also  indicate 

56 

failure  when  the  concept  is  not  linearly  separable.  In  many 
cases  algorithms  are  introduced  on  the  basis  of  empirical 
evidence  that  they  converge  "in  many  cases.'1  Nothing  is  known 
regarding  the  convergence  of  algorithms  for  "multi-layer  nets," 
although  empirical  algorithms  have  been  used  for  designing 
these  in  the  literature,  both  for  Pattern  Recognition  and 
game-playing. 

Kb  has  been  indicated  above,  another  major  difficulty 
with  non-linear  or  piecewise  linear  discriminant  functions 
(and  to  the  px  sent  author's  mind  these  are  the  only  functions 
which  have  any  promise  of  success)  lies  with  the  extremely 
large  number  of  coefficients  to  be  stored.  An  equally  important 
consideration  closely  connected  with  this  is  the  "generalizing 
ability"  of  the  discriminant  functions  formed  from  these  co¬ 
efficients.  As  was  said  in  the  beginning  of  Chapter  IV,  our 
discussion  has  been  limited  to  learning  and  describing  concepts 
wiulout  any  reference  to  the  phenomenon  of  generalization. 

Some  attempts  will  b©  made  towards  discussing  generalization  in 
the  next  section.  The  discussion  will  attempt  to  bring  out  the 
importance  of  appropriate  description  languages  and  "features." 
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5.  Generalization  -  “Concept  Formation"  and  Languages 

It  has  been  seen  in  the  previous  section  that  con- 
cept  formation  or  feature  extraction  plays  a  very  important 
role  in  simplifying  the  expressions  which  describe  a  concept. 
There  has  also  been  a  belief  that,  somehow,  extracting  the 
"correct"  features  makes  subsequent  learning  easier.  Also, 
that  once  one  is  in  possession  of  a  correct  set  of  features 
(i.e.,  has  formed  the  right  concepts)  one  can  generalize  from 
the  encountered  tagged  objects  wel  '•nough  to  recognize  latter 
objects  with  a  high  degree  of  confidence  based  on  the  descrip¬ 
tions  formed  by  a  learning  program.  In  the  absence  of  good 
features,  "rote  learning"  seems  to  be  the  or  y  possible  learning 
method  and  one  cannch  generalize  well  from  a  description  learned 
"by  rote." 

In  what  follows,  a  preliminary  effort  will  be  made  to 
give  a  rough  mathematical  framework  to  give  meaning  to  the  terms 
used  above  and  justification  for  the  beliefs  indicated. 

The  very  existence  of  a  possibility  of  generalization 
indicates  that  the  class  of  all  concepts  to  be  recognize^  (the 
cla. ■  described  by  question  4  in  the  previous  section)  is  re¬ 
stricted  to  a  subset  of  the  class  of  all  concepts.  To  make 
this  point  clear,  it  may  not  be  necessary  to  take  the  mcst 
general  environment.  It  will  suffice,  as  an  example,  to  take 
an  Universe  having  a  full  fine  structure  family  of  input 
properties  having  n  properties  in  the  family  and  values  of 


each  property.  The  main  concern  here  will  be  the  richness  of 
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the  cl?ss  of  concepts  rather  than  the  simplicity  of  their  de¬ 
scription:  hence,  non-input  properties  need  not  be  considered. 


n 


There  are  m  objects  in  this  environment  and  hence, 


n 


2  possible  concepts.  Any  time  a  specific  object, 

p.  .  fl  •••  fl  p  •  *  X,  is  known  to  belong  to  a  specific  (unknown) 

x  x  i  n  x 

l  n 

concept,  those  concepts  to  which  X  does  not  belong  are  eliminated 

from  those  under  consideration,  and  the  concept  to  be  learned 

nm  ^ 

is  known  to  belong  to  one  of  2  possible  concepts.  In 

general,  when  k^  objects  are  presented  to  a  learning  algorithm 
as  belonging  to  a  concept  and  k2  objects  are  presented  as 
belonging  to  the  complement  of  a  concept,  then  there  are 
nm-k.~k2 

2  possible  choices  for  the  concept.  This  number,  it  will 

be  noticed,  does  not  reduce  to  1  (to  "correct"  learning)  till 
ki  +  k2  =  nm,  (i.e.,  till  every  object  in  the  Universe  has  been 
presented) . 

A  restriction  of  the  class  of  concepts  to  be  learned, 
then,  seems  essential.  Such  restriction  leads  to  extremely 
fast  convergence  (exemplified,  for  instance,  by  the  Pennypacker 
Algorithm  in  the  case  of  conjunctive  concepts).  However,  re¬ 
sults  of  experiments  (on  the  perceptron,  for  example,  or  even 
what  can  foresee  in  the  fu,  ire  for  the  CSL  or  the  Pennypacker 
and  Snediker  Algorithms,  indiscriminately  applied)  indicate 
that  ad  hoc  restrictions  stand  very  little  chance  of  "standing 
up  to  reality."  The  restriction  has  to  be  learned,  just  as 
the  concepts  themselves  have  to  be  learned. 


.y  4*  -  -  - ^■^•■T-^Trrtn  in-fjufcjr 
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The  last  sentence  has  to  be  persued  with  some  care. 

It  will  be  noticed  that  the  learning  of  a  concept  consists  of 
the  learning  of  the  union  of  a  class  of  objects.  The  learning 
of  a  restriction,  on  the  other  hand,  consists  of  the  learning 
of  a  class  of  concepts.  Although  the  language  of  Section  6, 
Chapter  IV,  is  adequate  for  describing  both  sets  and  classes 
of  sets  (so  glibly,  in  fact,  that  unless  a  /  a  is  introduced 
as  an  axiom,  contradiction  will  result!)  it  is  probably  pre¬ 
mature  to  suggest  that  both  the  learnings  go  on  in  the  same 
language!  Much  better  understanding  of  the  "second  level" 
learning  (learning  of  classes)  will  be  needed  before  that. 

For  the  present,  it  will  be  assumed  that  there  exists 
certain  concepts  which  (even  though  one  is  not  called  upon  to 
learn  them  through  tagged  objects)  may  be  used  in  constructing 
simple  descriptions  of  concepts  th">t  are  learned  by  tagged 
objects.  Second  level  learning  (or  "feature  extraction"  or 
"concept  formation")  can  thought  of  as  consisting  in  the 
recognition  of  the  former  concepts.  In  Section  4,  and  example 
was  given  to  indicate  how  this  phenomenon  may  possibly  be  made 
into  an  algorithm.  Very  little  research  has  gone  in  this 

direction  (extraction  of  masks,  as  done  by  Uhr, and 
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Niellson,  are  efforts  in  this  direction,  although  they  are 

strong] y  biased  in  the  direction  of  character  recognition  and 

59 

can  fail  (see,  for  instance,  BOGART  ),  when  tried  on  more 
ambitious  projects. 

It  may  be  somewhat  easier  to  express  the  thoughts  and 
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to  reduce  the  ch-nce  of  misunderstanding  if  the  above  paragraph 
is  interpreted  formally.  This  will  be  done  in  the  next  few 
paragraphs.  The  reader  is  warned  that  the  only  reason  for  the 
formalism  here  is  precision.  No  deeper  insight  results  from 
it  immediately. 

Let  <U,  P,P>  be  a  real  environment,  where  P  is  the 
entire  class  of  input  properties.  Let  there  also  be  given  a 
class  F  of  specific  modes  of  combination  of  sets  to  obtain  new 
sets  (which  modes  may  be  operation-  like  union  and  complementa¬ 
tion  or  may  be  linear  or  non-linear  threshold  schemes}.  One 
now  defines  an  ordering  0 (P, F)  on  the  class  Cp  of  concepts.  If 
the  concept  is  lower  '  van  concept  in  this  order,  then 
has  a  simpler  description  than  C2- 

Given  a  class  of  concepts  C  c  Cp  ,  P  will  be  called 
satisfactory  for  C,  if  every  element  of  C  ranks  low  in  the 
order  Q(P,F).  If  P  is  not  satisfactory,  then  a  set  of  concepts 
CQ  will  be  said  to  be  concepts  formed  in  view  of  C"  if 
P  U  { (c,  c) | c  c  c  )  is  satisfactory  for  C. 

Evidently,  the  class  C  is  "formed  in  view  of  €“  in  a 
very  trivial  way.  A  good  concept  former  would  be  expected  to 
form  a  class  C  in  some  "optimal"  way  which  has  not  been  defined 
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yet. 


The  restricted  class  of  concepts  to  be  considered 
for  avoiding  the  difficulty  of  generalisation  will  be  the  class 
which  are  easy  to  describe  in  terms  of  the  language  available 
after  concept  formation.  The  major  point  that  ought  to  tx. 


-299- 


emphasized  in  this  section  is  that  restrictions  based  on  the 
assumption  of  simplicity  of  description  has  an  extremely  strong 
repurcussior:  on  what  one  understands  by  the  "generalizing 
ability"  of  a  learning  algorithm. 

"Generalizing  ability"  in  the  sense  discussed  before, 
can  at  present  be  identified  most  closely  with  the  term, 
"confidence  level"  as  used  in  the  field  of  statistical 
hypothesis  testing.  Also,  the  slight  confusion  with  respect 
to  the  acceptable  definition  of  "concept  formation"  is  reflected 
remarkably  well  in  the  slight  confusion  that  occurs  in  the  use 
of  the  word  "degrees  of  freedom"  in  that  field  {the  present 
author  is  thankful  to  Professor  Herbert  Simon  of  Carnegie  Tech, 
for  pointing  this  out). 

This  last  fact  c<_.i  probably  be  brought  out  by  an 
example.  Let  the  experiment  consist  of  exhibiting  two-digit 
decimal  representations  of  the  first  99  positive  integers, 
tagging  some  of  the  representations  with  1  and  the  others  with 
zero.  Let  all  elements  of  the  sec  [ 1,  11,  13,  IS,  2?,  33, 35, 47. 49, 59} 
be  tagged  with  1  and  let  the  elements  of  (4,  14,  16,  18, 24, 32. 38, 42, 
56,56;  be  tagged  with  0.  If  the  values  ot  the  first  and  second 
digits  define  the  only  two  properties  of  the  environment,  one 
can  obtain  the  fcll_.,ing  contingency  table  for  a  Chi-Square 
test 


Tagged  with 


Ending  with 


'The  contingency  table  yields  a  value  of  27  for  the 
Chi-Square.  This  has  a  significance  level  (with  the  attendant 
error  arising  from  the  small  size  of  the  sample  and  with  a 
degree  of  freedom  15)  of  0.025  which  may  not  be  considered 
significant.  On  the  other  hand,  if  evenness  of  a  numeral  is 
considered  to  be  a  property  of  the  environment  one  could  get 
the  following  contingency  table 

Tagged  with 


0 

1 

eveii 

10 

0 

second  digit  odd 

1  0 

10 

2 

The  x  this  time,  for  the  &  me  hypothesis  of  uniform 
distribution  with  1  ne  degree  of  freedom  4  is  20.  This  value 
(quite  accurately  this  time)  is  significant  to  the  level  of 


loss  than  .001. 
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There  is  little  in  the  theory  of  sampling  itself  to 
indicate  which  of  the  two  contingency  tables  should  actually 
be  used  for  testing  significance.  It  appears  that  to  interpret 
the  phenomenon  indicated  above  inside  statistics,  the  latter 
may  have  to  be  enriched  by  considerations  of  the  available 
description  language.  The  suggestion  made  some  paragraphs  back 
regarding  restricting  the  generalizable  concepts  to  easily 
describable  concepts  was  based  on  the  observation  that  in  any 
contingency  table,  the  cells  are  always  chosen  to  be  the  ones 
most  easily  describable. 

The  number  of  rows  in  the  contingency  table  is  large 
if  the  concept  being  tested  involves  the  union  of  a  large  number 
of  simply  describable  concepts.  This  will  explain  the  statement 
in  Chapter  IV,  Section  1  that  the  "size"  of  the  connective  "or" 
seems  to  be  larger  than  that  of  other  usual  connectives. 

It  must  be  pointed  out  that  even  when  a  concept  is 
described  as  an  union  of  simply  describable  concepts,  generali¬ 
zation  is  not  impossible.  In  the  first  contingency  table  above, 
for  instance,  if  the  entries  in  each  cell  were  doubled,  one 
could  consider  the  table  as  a  significance  indicator  for  the 
hypothesis,  "All  numerals  ending  in  2,  4  6  and  8  are  tagged 
with  1."  Only  more  observations  woo  be  needed.  If  it  turns 
out  that  the  structure  of  the  environment  and  of  the  language 
are  such  that  too  many  observations  are  not  possible  in  each 
cell,  generalization  is  impossible  without  a  change  in  language. 
One  can  think,  for  instance,  of  the  concept  of  "even  integers, 
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describsd  as  a  piecewise  linearly  separable  concept  where  the 
only  known  feature  of  an  integer  is  its  value.  Generalization 
would  be  impossible  on  the  basis  of  observing  the  two  tagged 
sets  mentioned  a'w’e.  On  the  other  hand,  if  one  used  the 
digits  in  the  binary  representation  of  the  numeral  as  features, 
the  concept  of  even  numbers  would  be  linearly  separable  and, 
hsnce,  easily  generali<-able. 

A  learning  and  concept  forming  algorithm,  starting 
from  a  fine  structure  family  of  properties,  could  learn  concepts 
by  some  technique  in  which  the  class  of  learnable  concepts  is 
not  restricted.  If  the  concepts  learned  by  it  are  not  simply 
describable  (and  if  the  extremely  unfortunate  situation  discussed 
in  the  previous  paragraph  does  not  occur) ,  then  the  learned 
concepts  need  a  large  number  of  tagged  objects  to  establish 
their  significance.  Onc«  these  are  established,  concepts  may 
be  formed  to  simplify  the  description  of  every  concept  learned. 
Attempts  are  then  made  to  learn  later  concepts  within  the  re¬ 
striction  imposed  by  the  newly  formed  concepts.  In  the  environ¬ 
ment  which  follows  a  restriction  dictated  by  these  newly  formed 
concepts,  generalization  of  the  learned  concepts  will  be  easier; 
otherwise  the  class  of  formed  concepts  would  be  modified. 

It  is  the  author's  belief  that  the  development  of 
this  kind  of  algorithms  is  essential  if  Pattern  Recognition  is 
to  become  a  viable  branch  of  Artificial  Intelligence  -  indeed, 
if  Artificial  Intelligence  ever  has  to  become  a  viable  field. 

So  far  the  discussions  have  been  carried  out  with 
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the  background  of  non-numerical  description  languages  and 
recognition  techniques.  In  the  case  of  techniques  based  on 
the  assumption  that  the  objects  are  vectors  of  n  real  numbers, 
analogous  discussions  remain  valid.  It  may  be  worthwhile  tc 
limit  discussions  to  polynomial  discrim.i ^ont  functions.  (It  may 
be  repeated  here  that  initial  restriction  of  learnable  classes 
to  "linearly  separable"  or  "distributed  normally"  impose  con¬ 
ditions  on  the  initial  measurements  which  are  extremely  ill- 
understood).  That  is,  let  C  be  such  that  x  cc  =  o  ^  P(x) 
where 


n  n. 

P(x)  -a  +  Z  a.x,+  22  a..x.x. 

°  i  =  i  1  1  i,  j  =  i  i  3 
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It  will  be  assumed  that  if  an  element  is  chosen  from 
the  concept  C,  the  probability  that  a  specific  x  is  chosen  is 
f(x,c);  similarly  the  probability  of  x  to  be  chosen  when  an 
element  is  chosen  from  c  is  given  as  f(x,c).  It  will  be  noted 
that  in  Baysian  techniques  of  Pattern  Recognition,  it  is  the 
parameters  of  f  that  are  estimated.  However,  since  the  "a"s  of 
the  above  polynomial  are  functions  of  these  parameters  (note 
that  f(x,c)  -  o  if  P(x)  >  o)  it  will  be  assumed  that  the 
estimation  of  the  "a"s  is  the  matter  in  issue. 

Let  now  k^  vectors  (y]  ,y.?  )  be  ^resented  to 
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the  learning  algorithm  tagged  with  1  and  vectors 

(*1  , * • • • » Zy .  )  are  presented  tagged  with  0,  On  the  basis 

of  these  vectors  the  algorithm  estimates  coefficients  bQ  , 


<bi»-  £b13> 


{b10^  }.  (Of  course,  in  any  practical 

• «n 


case,  many  of  the  higher  degree  coefficients  will  be  zero.) 

The  "b”s  are  functions  of  the  variables  [y^  ,y2  }  and 

(z^  , ...,z^  },  which  have  probability  distributions  f(y,c)  and 

f(z,c)  respectively.  Hence,  each  b  will  have  a  probability 
distribution  and  need  not  be  equal  to  the  "a"s  unless  k^  and 
k2  be  extremely  large.  However,  if  the  learning  procedure  is 
any  good,  the  distribution  of  each  b  will  be  centered  around 
the  corresponding  a. 

In  the  language  of  the  theory  of  small  samples ,  each 
coefficient  b  is  an  estimator  of  the  corresponding  a.  While  in 
the  past  workers  in  the  field  have  been  generally  satisfied  if 
the  estimates  are  unbiased,  for  discussion  of  generalization, 
the  efficiency  of  these  estimators  must  be  known.  The  efficiency 
will  be  given  by  the  estimation  procedure  and  the  distribution 
f(x,c).  However,  in  the  general  case,  the  following  discussion 


is  germarne. 

In  any  good  estimation  technique  each  "b"  will  have 
the  corresponding  "a"  as  mean  and  will  have  some  variance  6 
which  will  decrease  with  increasing  k^  and  k^.  However,  the 
rate  of  convergence  can  be  seen  to  be  seriously  restricted  by 
the  number  of  "bMs  being  estimated. 


'She  number  of  degrees  of  freedom  is  not  k^  +  kj  but 
is  reduced  by  the  number  of  parameters  being  estimated.  As  a 
result,  the  number  of  observations  ntr^ed  for  generalizaoi  n 
become  larger,  the  larger  the  number  of  parameters  estiamted 
(i.e»,  the  more  complex  the  discriminant  function  is).  The 
reduction  of  the  number  of  parameters  is  only  possible  by 
using  carefully  chosen  mec.surements  for  ti.  components  of  the 
vector.  Even  though  the  input  properties  look  "naturally 
numerical"  that  is  no  indication  that  the  natural  choice  is 
reflected  in  any  way  on  the  restriction  to  the  concepts  being 
learned. 

An  example  may  make  the  point  clear.  Consider  the 
concept  {4,5,8}  in  the  environment  indicated  in  Figure  1.1. 
Denoting  the  "natural"  property  "number  of  borders"  by  x  and 
"number  of  figures"  by  y,  this  concept  is  represented  by  the 
set  of  vectors  {(2,1),  (2,2),  (3,2)}.  The  reader  may  convince 

himself  that  this  set  is  not  linearly  separable.  However,  if 
new  features  z  and  w  are  defined  as  follows 

z  =  2  if  x  =  3,  y  =  l 

z  =  1  if  x  ~  2,  y=2 

z  =  y  otherwise 
w  =  2ifx  =  3,  y  =  1 

V/  -  3  if  x  =  2,  y  =  2 

w  ~  x  otherwise 

the  concept  appeals  as  the  set  of  vectors  .(2,1),  (3,1),  (3,2)} 

which  is  separated  by  the  linear  polynomial  w  -  z  -  y.  a  very 
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unnatural  numerical  measure  turns  out  to  be  the  useful  one  as 
far  as  simplicity  of  expression  is  concerned, 

An  alternative  mode  of  feature  extraction  may  be 
indicated,  by  pointing  out  that  the  "features"  z  and  w  may  well 
have  been  looked  upon  as  functions  of  x  and  v  which  rendered 
the  concept  separable.  Ills  form  of  the  function  z{x,y)  is 
seen  to  be  quite  complicated.  Hence*  if  a  non-linear  discrim¬ 
inating  function  has  to  be  constructed  by  replacing  z  and  w  by 

1 

complex  non-linear  functions  of  x  and  y  in  w  -  z  -  j,  all 
semblance  of  simplicity  would  be  lost.  However,  if  one  is 
faced  with  a  large  number  of  highly  complex  discriminants  for 
a  iarge  number  of  concepts  in  an  environment  of  n  dimensional 
vectors  (x^  .Xj  ,...,xn)  and  discovers  a  set  of  transformations 

"  Vi  <*1 . *n)(1  4  1  4  r-> 

such  that  the  original  discriminants  are  simple  functions  of 
the  y^  ,  one  may  consider  the  y's  as  the  significant  feature, 
of  the  environment  and  use  them  for  subsequent  learning  and 
generalization. 

In  the  absence  of  good  measurer  .its  (good  "features" 
in  the  general  case)  concept  formation  is  an  essential  adjunct 
to  Pattern  Recognition,  no  matter  how  sophisticated  may  be  the 
modes  of  combination  of  the  basic  predicates.  As  has  been  said 
before,  past  experience  has  shown  that  threshold  gates  are  in 
no  way  more  effective  than  Boolean  gates,  if  the  features  are 
not  good.  On  the  other  hand,  recent  work  has  shown  that  with 
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good  features,  quite  economical  switching  circuits  (with  a  very 
manageable  number  of  gates)  suffice  for  recognition.*^ 


-308- 

6.  Learning  Games  by  Generalization  -  Importance  of  Description 

Languages 

It  was  shown  in  Section  9  of  Chapter  III  that  the 
sets  {w^}  acted  as  adequate  approximations  to  the  evaluations 
of  any  Tic-Tac-Toe-like  game.  It  might  be  noticed  that  all  the 
basic  predicates  involved  in  the  description  of  the  sets  {wp 
are  the  same  as  those  involved  in  the  description  of  Tic-Tac- 
Toe-like  games.  Hence,  the  Universes  and  fine  structure 
families  of  properties  needed  for  describing  the  rules  of  the 
games  are  adequate  for  the  description  of  the  {w^}.  However, 
if  in  addition  to  the  basic  predicates  one  also  used  the  derived 
predicates  #  (A)  ■  i  and  (3n)((n,A)  e  .  &  (n,B)  e  C  ),  then 

the  description  of  the  {vr}  becomer  much  simpler. 

However,  one  important  point  was  not  made  adequately 
in  the  previous  discussion:  that  learning  descriptions  of  the 
fw^}  as  combinations  of  predicates  of  the  above  form,  leads  to 
correct  generalization  with  very  little  data. 

The  point  is  probably  best  illustrated  by  an  example. 
Consider  any  plane  (horizontal,  vertical  or  diagonal)  in  the 
cubic  board  with  three  cells  assigned  to  X  and  the  rest  by  A, 
as  shown  in  Figure  5.? (a).  For  convenience,  the  cell  jssigneu 
to  a  have  b**en  shown  empty.  That  any  configuration  in  qubic 
which  contains  a  plane  like  this  and  it  is  the  players  move  is 
in  can  be  easily  seen  by  the  persual  of  Figure  5.2(b).  Here 
each  represents  the  i—  move  of  the  player  and  each  that 
of  the  opponent.  That  this  is  also  a  member  of  K,, 

D 
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a  persual  of  Figure  S.^c)  ^.nd  the  accompanying  intersection 
matrix  (which  can  be  seen  to  be  merely  an  alternative  represen¬ 
tation  of  a  weighted  graph).  The  intersection  nodes  here  have 
been  numbered  to  bring  out  the  reason  for  and  being  the 
same  set. 

Tt  will  be  noted  that  the  intersection  matrix  of 
Figure  5.2(c)  also  describes  other  members  of  K&.  Some  of 
^hese  are  shown  in  Figure  5.3.  Also  any  position  equivalent 
to  one  of  these  under  the  multifarious  symmetries  of  the  qubic 
board,6'1'  would  have  the  same  matrix.  Also,  a  plane  with  some 
extra  Y' s  (for  instance,  with  one  between  Y2  and  in  Figure 
5.2)  would  have  identical  descriptions.  So  would  a  configuration 
which  have  the  same  intersection  matrix  but  with  some  of  the 
lines  in  a  different  plane  (and  all  their  symmetrical  equivalents). 
This  mode  of  description  (the  intersection  matrix  is  merely  a 
convenient  representation  of  the  statement  forms  discussed 
before)  is  thus,  more  powerful  than  storing  specific  positions 
and  considering  the  symmetries  of  the  board.  This  latter  method 
has  been  a  favorite  in  the  field  and  both  Citrenbaum  and 
Koffman  have  been  erroneously  criticized  for  not  using  this  less 
efficient  method  which  is  applicable  only  to  Qubic.  The  number 
of  symmetries  in  games  like  Bridg-it  and  Go-Moku  are  far  fewer 
but  the  description  shown  here  remains  equally  general. 

This  is  also  a  convenient  place  to  point  out  that  if 
in  Figure  5.2  there  was  an  extra  Y  between  Y^  and  ,  the 
resulting  position  could  have  the  same  connection  matrix  but 
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Figure  5.3 
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woulc.  not  be  a  member  of  W^.  This  indicates  how  some  members 
of  U  VH  are  net  members  of  U  W^. 

Hoffman* s  learning  program  is  designed  to  learn 
descriptions  of  {w^}  when  their  elements  occur  in  the  course 
of  a  play  of  the  game.  Here  generalization  is  very  much 
facilitated  by  an  analysis  of  the  actual  course  of  a  game. 

The  program  carries  out  the  ur* lysis  as  follows. 

The  first  game  is  played  at  random  by  the  program 
till  it  is  defeated  (or  accidentally  wins).  The  winning  move 
is  now  removed  and  t.._  file  uncovered  by  the  process  is  stored 
in  the  trivial  matrix  as  a  description  of  W^.  From  then  on, 
no  win  against  the  program  is  possible  with  a  single  threat. 

When  the  machine  is  defeated  (or  accidentally  wins)  by  a  !ifork,,! 
removal  of  the  winning  move  reveals  a  position  whose  description 
was  already  available  in  memory.  At  this  point  the  previous 
move  is  removed,  and  the  line  uncovered  together  with  its  inter¬ 
section  with  the  lines  satisfying  the  previous  description  is 
stored  as  a  new  matrix.  Now  the  program  blocks  all  forks  and 
initiates  _t3  own  forks  when  possible.  Learning  continues  in 
subsequent  defeats  and  accidental  wins  by  deeper  forks  in  a 
similar  fashion.  Successive  previous  moves  are  removed  till 
no  match  is  found  wj.th  previous  descriptions.  The  last  move 
removed  is  then  analysed  to  reveal  the  alternative  threat  that 
was  blocked.  The  lines  of  this  threat  and  that  of  the  previous 
one,  together  with  their  intersection  pattern,  is  then  stored 


in  a  new  matrix. 
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It  ought  to  be  pointed  cut  that  in  the  absence  of  this 
kind  of  analysis  descriptions  of  the  [wp  could  be  learned  as 
conjunctions  of  statements  from  a  large  number  of  examples  by 
seme  algorithm  analogous  to  Pennypacker 1 s .  However,  the  above 
analysis  leads  to  a  more  rapid  learning*  As  a  result, 

Koffman' s  program  needs  to  play  only  about  12  games  before  it 
defeats  its  opponent  b0%  of  the  time  rn  Qubic  and  Go-Moku  and 
wins  in  Bridg-it  every  time  it  plays  first. 
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9 •  Approximation  to  Strategies  in  Tic-Tac-Toe-Like  Games 

Tic-Tac-Toe-like  games  have  already  been  discussed  in 

Section  5.  In  the  present  section  certain  subsets  {wO  of  S 

(the  set  of  situations)  will  be  discussed  which  contain  the 

sets  {W^}  although  they  do  not  coincide  with  the  sets  f } - 

In  what  follows,  the  definitions  for  W'.  w^ll  be  introduced. 

i 

It  will  be  shown  i’  chapter  V  that  descrii  tior.s  of  [VT  }  are 

much  easier  to  learn  than  those  of  {for}.  The  significance  of 

this  learning  will  be  clarified  in  Chapter  V. 

It  will  be  recalled  that  a  Tic-Tac-Toe-like  game  is 

completely  specified  by  a  set  N  of  cells  and  two  subsets,  6  and 
N 

B,  of  2  ,  called  the  winning  and  losing  files.  Given  a  game 

<N,C,3>  one  can  define  a  reduced  game  <N,G,0>,  with  the  same 

set  of  cells  and  winning  files,  but  no  losing  files.  The 

evaluations  of  <N,G,B>  will  be  denoted  by  { W ^ }  and  those  of 

<N,G,0>  by  { UH  j .  Similarly,  the  situations  to  which  (n,X)  or 

(n,  Y)  are  applicable  will  be  denoted  by  S,  ,,,  and  S,  ,r,  as 

-'(n,X)  in  Yj 

before  for  <N,G,$>  and  by  S',  and  S',  for  <N,G,0>. 

J  (n,Xy  ( n ,  Y ) 

The  following  theorem  indicates  how  the  sets  { W ^ } 
act  as  approximations  to  W.. 

i 

Th^'^em  3-22:  If  s  £  W.  then  s  e  :J  W’. . 

- -  x  ■ 

3 

Proof;  if  s  ?:  W-  then  there  is  an  n  *:  N  such  that 
-  i 

s  ^  S ,  ar.d  s,  =  (n,X)(s)  c  w.  However,  s  c  S,  implies 

\n,  a;  a  \ n,  x ) 

that  s  £  S-L,  |  a  1  iX )  J  -  |  s  ” 1  ( if)  j  .  and  s(n)  -  Since  L  is 

empty  in  <N,  this  also  implies  that  3  f  S  xy  Also, 


s^  «  (n,X)(s)  e  implies  that  s-^  (X)  =  A  for  some  A  e  G  and 
if  no  B  e  8,  s~^ (Y)  2  8.  Again.,  since  B  is  empty  in  <N,G,0>, 
this  implies  (n,X)(s)  e  W  for  <N,G,0>  also.  Hence,  s  W^. 
•Bie  theorem  is  thus  true  for  i  =  1. 

Let  the  theorem  be  true  for  i  ~  k.  Let  s  e  • 


If 


k 

8  e  U  W.  ,  there  is  nothing  to  prove.  Otherwise  recall  that 
j  *  1  3 


there  exists  an  n  e  N  such  that  s  e  S^n  ^  and  for  each  n'  such 


U  w . 


that  (n,  X)  (s)  e  S,  ,  .  ,  (n’  ,  YM  (n,  X$  (s) )  e 

j  =  1  3  j 


k 

U  W' 
-  1  3 


Since  S',  2  s,  v.  as  pioved  before  and  since  S'.  ,  „x  -  S,  ,  „ 

in,xj  \n,A)  \n  ,  y  )  in  ,  y 

can  be  proved  similarly,  this  implies  that  s  e  W£+^.  This  proves 
the  theorem. 


In  what  follows,  elements  of  U  will  be  given  an 


alternative  description  which  will  be  easier  to  test  than  the 
exhaustive  trial  indicated  by  the  definitions  in  Section  b  used 
so  far.  For  this,  the  following  ideas  will  have  to  be  intro¬ 
duced  - 


Let  a  and  3  be  two  arbitrary  sets,  let  C  -  a  x  i  and 
let  I  be  a  function  mapping  t..e  range  of  the  relation  C  into 
integers.  Then  the  pair  <C  will  be  called  a  weighted  graph 
on  a  and  S . 


<C 


3 


Given  a  situation  s  in  a  Tic- Tac-Toe- 1  ike  game,  lot 
,  #  >  be  the  weighted  graph  on  N  and  U  defines  as  follows; 


e  C 

*3 


(i)  ( n,A) 


if  and  only  if  s(n) 


n  •;  A  a  no 
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(ii)  #s (a)  =  |a  n  s " 1 ( a ) | 

The  ideas  involved  here  may  perhaps  be  illustrated 
by  reference  to  Figure  3.6,  showing  some  situations  in  a  3  x  3 
Tic-Tac-Toe  game.  If  the  cells  are  called  1  to  9  in  the  usual 
order  then  the  set  of  all  files  are  (1,2,3),  (4,5,6),  (7, 8, 9), 
(1,4,7),  (2,5,8),  (3,6,9),  (1,5,9)  and  (3,5,7).  Calling  these 
a  to  h  respectively,  the  weighted  graph  of  the  board  shown  in 
Figure  3.6(a)  is 

C  =  (  (7,c),  (7,h),  (2,e).  (3,h),  (3,g).  (6,f),  (8,e),  <8,c)} 

# (c)  =  2;  # (e )  =  2;  *(f)  =  2;  # (h)  =  2. 

C  and  f  are  represented  in  Figure  3.6(b)  in  a  graphical  form. 

For  an  understanding  of  what  follows  it  would  be 

worthwhile  to  indicate  what  happens  to  the  graph  <Cg  , #g>  as 

the  situation  changes  as  a  result  of  applying  controls  and 

distrubances  ("moves"  and  "countermoves").  Two  steps  of  change 

are  indicated  in  Figures  3.6(c)  (d),  (e)  and  (f).  The  effects 

indicated  in  these  pictures  can  be  formalised  as  follows: 

For  each  element  n  of  a,  let  X  and  Y  be  two  func- 

n  n 

tions  from  weighted  graphs  to  weighted  graphs  defined  as  follows: 

Xn  <<C.«»  =  <C\»'> 
whe  re  c '  =  f  ( a  -  { n } )  x  p  j  0  C 

and  (A)  =  * (A)  if  (n,A)  /  C 

* (A)  -  1  if  { n , A )  c  c 
Yn  (<C, *>)  *  <C’ , #' > 

where  C'  -  [  (a  -  { n]  )  x  (5  -  C (n)  )  j  H  C 
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#' (A)  #  (A)  for  all  elements  of  the  range  of  C . 

Theorem  3.23;  In  any  Tic-Tac-Toe-like  game  and  any 
situation  s 

<C(n,X)(s)  *  #(n,X)  (s)>  =  Xn  (<Cs  ,#s>) 

<C  (n,  Y)  (s)  '  #(n,Y)(s)>  =  Yn  (<Cs  '#s>J 

whenever  the  left-hand  sides  are  defined. 

Proof  Let  (n,X)(s)  be  defined,  i.e.,  let  s  £  S^n 

Then  s(n)  *  a.  if  (n,X)(s)  =  ,  then  s^(n)  =  X  and  for  all 

m  e  n,  m  /  n  implies  3.{n)  =  s(n).  Hence,  (m,A)  c  c,  if  and 

1  sx 

only  if  (m,A)  £  Cg  and  m  /  n.  Also,  for  any  A  in  the  domain  of 
Cg,  ,  jA  n  s^^(A)|  =  |a  (1  s  ^(a)|  unless  n  e  a,  i.e.,  if 

(n,  A)  e  C,  in  which  case  |a  f)  s^(a)|  =  |  A  0  S ~ ^  ( A }  j  -  1.  This 

proves  the  first  part  of  the  theorem.  The  proof  of  the  second 
f rt  is  left  to  the  reader. 

For  an  alternat  ve  description  of  the  sets  {wO  one 
hu3  to  define  the  following  class  fjh)  of  sets  of  weighted 
graphs . 

A  weighted  graph  <C,*>  belongs  to  if  and  only  if 
there  is  an  A  in  the  range  of  C  such  that  *  (A)  =  1. 

is  defined  for  i  >  1  as  follows: 

i 

Let  be  any  graph  which  is  a  member  of  J  J 

3*1  3 

x 

and  such  that  for  all  n  it  is  true  that  Y  (C  ,  * '  )  c  ij  j.. 

n  j  =  1  1 

Let  C-^  .Cj  Cn)  be  the  set  of  all  subgraphs  of  C'  such 


-168- 


that  for  each  p(l^p^n>(C  , #  )  is  a  member  of 


1 

U 

j  =  1 


J 


{*  is  the  restriction  #'  to  the  range  of  C  ).  Let 
P  p 

(A^  ,A2  ,  ..  .  A^)  be  a  set  of  elements  in  the  range  of  C  such 

that  there  is  at  least  one  A^(l  ^  q  ^  m)  in  the  range  of  each 

C  (1  5  p  ^  i).  Let  n  be  any  element  of  a  not  in  the  domain  of 

C' .  Let  <C",#">  be  constructed  as  follows: 

C"  =  C'  U  {(n.A,),  (n,AJ,  ...  (n,A  )} 
i  t  m 

#"(A)  8  #'  (A)  +  1  if  A  =  A  (1  S  q  S  m) 

(A)  otherwise. 

A  weighted  graph  belongs  to  if  and  only  if  it 

i 

does  not  belong  to  U  J.  but  has  <C",#">  above  as  a  subgraph. 

j  =  1  3 


Theorem  3.24:  If  a  graph  <C,  *>  belongs  to  J\+1  there 

exists  an  n  in  the  domain  of  C  such  that  for  all  n' , 

*„■  (Xn(<C.*>))  €  1  J 

3  =  1  1 

Proof :  By  definition  of  ,  <C,  has  no  subgraph 

i 

belonging  to  J  J..  Also,  there  is  a  graph  and  a 

3  =  1 

subgraph  <C of  ^C,  *~>  such  that  is  constructed 

from  <C ' , * ' >  as  described  in  the  definition  of  J.  Let  n 

i  +  i 

be  a  member  of  a  which  occurs  in  C"  but  not  in  C' .  From  con¬ 
struction  of  <C" , 4 "  >  it  is  evident  that 


Xn  (C" ,*”!  ~  <C'  * ' > 


Since  (C",  *")  is  a  subgraph  of  <C,4l> 


<C '  ,  •  '  -»  is  a  subgraph 
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X 

of  Xn(<C,*>).  Since  for  all  n' ,  Yn<  (<C',#'>)  e  !J  Jh  , 


and  <C’,#'>  is  a  subgraph  of  Xn(<C,#>),  for  ail  n‘. 


Y  ,  (X<C,#>)  e  U  J. 

n  n  j  ,  1  3 


‘Theorem  3.25:  For  any  Tic-Tac-Toe-1  ike  game,  s  e 
if  end  only  if  ( s “ 1  (X)  f  =  |‘s"1(Y)|,  s~1(x)  ?  A  for  any  A  e  G 
and  <Cg  ,  rs>  e  Jk- 

Proof;  Let  k  =  1. 


If  <C  ,  #  >  e  J,  then  there  exists  an  A  e  G  such 
ss  1 

that  js  ^ (A)  (1  a(  =  1  and  s  ^  (Y)  H  A  =  0,  i.  ,e.,  for  all  cells 
in  A  except  one  s (m)  =  X,  and  for  one  cell  n  e  A,  s vn)  =  a. 
Since  |s  1 (X) |  =  |s  1(Y)j  and  s  1 (X)  0  A!  for  any  A'  £  G, 
s  e  S^n  xj.  Also,  if  (n,  X)  (s)  =  s 1  ,  then  s^n)  =  X  and 
s1(m)  =  s(m)  =  X  for  all  ceils  of  A.  Thus  s'^X)  2  a  and 
(n,X)(s)  W.  Hence,  s  £  W^. 

Let  now  s  e  ,  so  that  there  exists  an  n  c  N  such 
that  (n,  X)  (s)  £  W  Let  (n,X)(s)  =  s  Now  s'^X)  = 
s  (X'  U  [n]  .  Since  £  W,  tnere  is  an  A  G  such  that 
s  X(X)  U  { n 3  s  A.  However,  since  s  /  W,  s  ^  (X )  jt  A.  Hence, 
n  £  A  and  for  all  m  e  A  such  that  m  f  n,  s(m)  =  x.  Hence, 

#  (A)  =  1  and  s  £  J.  . 

3  1 

Let  now  the  tneorem  be  true  for  k  •>  i.  Let 
<Cfi  ,#s>  c  J  1  +  Then  by  Theorem  3.24  there  exists  an  n  such 
that  for  ail  n’ 


Y  ,  (X  (<C  , * 

n  n  s  s 


)) 


1 


J  ■ 
3 
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However,  since  s  /  W  and  js  ^ (X ) J  =  |s  ^  (Y )  |  ,  and 

from  the  proof  of  Theorem  3-25  of  <c  ,  #  >  as  a  member  of  J.. 

s  s  1  +  1 

there  is  an  n  e  n  such  that  s(n)  =  a,  s  e  S,  >  and  hence, 

v n ,  X; 

(n,X)(s)  is  defined.  Also,  whenever  (n*  , Y) ( (n, X) (s) )  is 
defined,  one  has  by  Theorem  3.24 

<C(n' ,  Y)  l  ;n,X)  (s)  )  '  *  (n 1  ,  Y)  (  (n ,  X)  (s )  )  > 


=  Yn 1  (Xn(CS  'V}  6  .  J 

1  =  1 


hence,  there  exists  an  n  such  that  s  e  S^n  and  for  all  n1 


such  that  (n,X)(s)  e  S 


(n,Y;  ' 


(n1  ,  Y)  (  (n ,  X)  (s)  )  £  J  W. 

1  =  1  • 


Hence , 


s  t  w . 


i  +  1 ' 


Let  now  s  W.  Then,  there  exists  an  n  such  that 

l  +  l 

sts,  ,  and  for  all  n*  such  that  (n,X)(s)  S,  , 

( n ,  X )  ( n  ’  ,  Y ) 


(n  1  ,  Y)  (  (n,  X)  (.,)  )  e 


w 


1  =  1 


and  hence, 


Y  ,  (X  (--C  ,*'>)) 

n  n  s  s 


K 


Since  Y  ,  X  (<C_  ,*,.'*))  is  a  subgraph  of  X  (<-C_  ,*„'»)  by 
n  n  b  b  -  nob 

definition  of  Y  ,  ,  X  (<C„  ,  •  ">)  has  subgraphs  <C ,  ,  *  >} 

n  n  S  S  V.  k  ' 


1  =  1 


K  .  Also,  none  of  ■„  <C,  ,  *.  >}  are  subgraphs  of 

1.  K 
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I 


i 

<C  , #  >,  since  <C  , f  >  ?  U  K .  by  induction  hypotheses- 

SS  ob  j  =  ^  1 

Hence,  n  must  occur  in  the  domain  of  each  of  these  subgraphs 
So  <C  ,#  >  has  a  subaraph  which  is  obtained  from  t^C,  >} 

S  S  "  K  K 

by  the  construction  shown  in  the  definition  of  +  Hence, 


K. 


i+1* 


The  reason  for  introducing  the  above  theorems  is  the 

fact  that  the  only  predicates  needed  for  the  recognition  of 

members  of  U  K.  are  the  values  of  *  for  the  different  files 
1  s 

i 

and  (in  view  of  the  construction  of  K.  ,  from  J  K.)  the 

1  =  1  J 

fact  that  s(n)  =  a  for  some  cell  n  common  to  a  number  of  files. 

'  N  . 

Given  a  situation  s  c  tX,Y,Aj*,  i,  an  assignment  of  X,  Y  and 

A  on  the  cells,  the  search  for  files  with  qiven  values  of  ■*  , 
and  having  certain  cells  in  s  ^(\)  in  common  between  files  is 
much  more  directed  than  the  exhaustive  mini-max  searches 
indicated  by  the  definition  of  iVvb  j. 

The  difficulty  with  the  description  of  w  thnnsui: 

the  K.  ,  however,  lies  in  the  fact  that  the  W  contains 

l  i 

J  W.  ,  but  does  not  coincide  with  it.  Hence,  the  K  a  tv  or  ly 

X  1 

approximations  to  W. .  The  reason  for  th is  is  that  the  e  i  i 

of  Theorem  3.22  is  not  true.  One  'eason  for  this,  in  l  urn . 

that  elements  may  be  elements  of  L  in  <N , n , S  ' ,  and  is  i 

member  of  w  S  .  .  only  because  L‘  is  empty  i .  ■'N,  •,>' 

n  '  N 

An  example  of  this  will  be  giver,  in  Chapter  v.  Section  t>. 

Howe'er,  because  the  difference  between  w  }  and 
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i W ' }  lies  mainly  because  of  the  emptyness  of  L' ,  a  state  in 

J  W.  can  be  tested  for  membership  in  U  W.  by  a  somewhat  we  11- 
1  i 

directed  search  also.  A  method  for  doing  this  has  been  pointed 

6  2 

out  by  Citrenbaum. 

Another  very  in yortant  reason  for  using  the  { }  as 

anproximat  ions  to  the  Wt  ]  is  that  the  (k^},  being  obtainable 

from  a  specific  mode  of  combination  of  statements  of  the  form 

*£  (A)  =  i  and  (7n)((n,A)  e  C_  &  (n,B)  £  C  ) ,  leads  to  easy 

generalizations  from  examples.  A  learning  program  based  on  such 

69 

generalization  was  developed  by  Koffman,  and  will  be  discussed 
in  Section  6  of  Chapter  V.  The  descriptions  learned  by  this 
program  is  utilized  by  a  garna-play ing  program  to  make  very  deep 
forcing  moves  during  the  play  of  any  Ti.c-Tac-Toe-like  game. 

In  this  sense,  the  program  is  game- independent  within  this 
class  of  games.  Given  any  game  <tt,C,8>  ic  plays  the  game 
legally  and,  on  the  basis  of  its  experience  improves  its  game 
-  often  to  defeat  its  opponent. 
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I  3  ASSTflAC  r 


This  report  considers  the  effect  that  specific  language  descriptions  have 
on  the  efficiency  of  pattern  recognition  and  problem  solving  methods.  The 
efficiency  of  a  language  for  the  description  of  a  given  set  is  viewed  in  terms 
of  the  size,  '..1  some  sense,  of  the  shortest  expression  that  denotes  the  set. 
Central  to  the  discussion  are  questions  of  how  the  description  of  a  concept 
should  be  stored  to  use  the  smallest  amount  of  memory,  and  how  the  description 
should  be  stored  and  processed  so  that,  given  an  object  and  a  concept,  an 
efficient  determination  can  be  made  on  containment  of  the  object  in  the 
concept.  The  languages  are  essentially  non-numeric,  enabling  pre-processing 
to  be  described  in  the  same  format  as  that  used  for  pattern  description. 
.Algorithms  for  learning  and  generalization  are  presented  that  use  two  of  the 
languages.  The  property  of  succinctness  is  considered  for  the  algorithms,  and 
the  effect  of  lack  of  succinctness  on  the  statistical  degree  of  confidence  in 
the  learned  description  i3  indicated.  Analogies  are  made  to  descriptions  in 
terms  of  discriminant  functions  and  maximum  likelihood  ratios. 
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